Overview

Dataset statistics

Number of variables36
Number of observations63578
Missing cells614883
Missing cells (%)26.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory17.5 MiB
Average record size in memory288.0 B

Variable types

Categorical24
Numeric7
Unsupported5

Alerts

TR6_NO has a high cardinality: 63556 distinct values High cardinality
HOUSE_NO has a high cardinality: 4189 distinct values High cardinality
STREET_NAME has a high cardinality: 2516 distinct values High cardinality
SUBMITTED_ON has a high cardinality: 4461 distinct values High cardinality
QEWI_NAME has a high cardinality: 2111 distinct values High cardinality
QEWI_BUS_NAME has a high cardinality: 4694 distinct values High cardinality
QEWI_BUS_STREET_NAME has a high cardinality: 3901 distinct values High cardinality
QEWI_CITY has a high cardinality: 541 distinct values High cardinality
QEWI_ZIP has a high cardinality: 233 distinct values High cardinality
QEWI_NYS_LIC_NO has a high cardinality: 471 distinct values High cardinality
OWNER_NAME has a high cardinality: 6058 distinct values High cardinality
OWNER_BUS_NAME has a high cardinality: 26232 distinct values High cardinality
FILING_DATE has a high cardinality: 4464 distinct values High cardinality
PRIOR_CYCLE_FILING_DATE has a high cardinality: 5274 distinct values High cardinality
FIELD_INSPECTION_COMPLETED_DATE has a high cardinality: 5145 distinct values High cardinality
QEWI_SIGNED_DATE has a high cardinality: 5198 distinct values High cardinality
COMMENTS has a high cardinality: 9038 distinct values High cardinality
CONTROL_NO is highly correlated with CYCLEHigh correlation
CYCLE is highly correlated with CONTROL_NOHigh correlation
BIN is highly correlated with BLOCKHigh correlation
BLOCK is highly correlated with BINHigh correlation
LATE_FILING_AMT is highly correlated with FAILURE_TO_FILE_AMTHigh correlation
FAILURE_TO_FILE_AMT is highly correlated with LATE_FILING_AMTHigh correlation
CONTROL_NO is highly correlated with CYCLEHigh correlation
CYCLE is highly correlated with CONTROL_NOHigh correlation
BIN is highly correlated with BLOCKHigh correlation
BLOCK is highly correlated with BINHigh correlation
LATE_FILING_AMT is highly correlated with FAILURE_TO_FILE_AMTHigh correlation
FAILURE_TO_FILE_AMT is highly correlated with LATE_FILING_AMTHigh correlation
CONTROL_NO is highly correlated with CYCLEHigh correlation
CYCLE is highly correlated with CONTROL_NOHigh correlation
BIN is highly correlated with BLOCKHigh correlation
BLOCK is highly correlated with BINHigh correlation
LATE_FILING_AMT is highly correlated with FAILURE_TO_FILE_AMTHigh correlation
FAILURE_TO_FILE_AMT is highly correlated with LATE_FILING_AMTHigh correlation
FILING_STATUS is highly correlated with FILING_TYPE and 1 other fieldsHigh correlation
FILING_TYPE is highly correlated with FILING_STATUSHigh correlation
CURRENT_STATUS is highly correlated with FILING_STATUSHigh correlation
CONTROL_NO is highly correlated with FILING_TYPE and 3 other fieldsHigh correlation
FILING_TYPE is highly correlated with CONTROL_NO and 4 other fieldsHigh correlation
CYCLE is highly correlated with CONTROL_NO and 3 other fieldsHigh correlation
BIN is highly correlated with BOROUGH and 1 other fieldsHigh correlation
BOROUGH is highly correlated with BINHigh correlation
BLOCK is highly correlated with BINHigh correlation
CURRENT_STATUS is highly correlated with CONTROL_NO and 3 other fieldsHigh correlation
FILING_STATUS is highly correlated with CONTROL_NO and 3 other fieldsHigh correlation
PRIOR_STATUS is highly correlated with FILING_TYPEHigh correlation
LATE_FILING_AMT is highly correlated with FAILURE_TO_FILE_AMTHigh correlation
FAILURE_TO_FILE_AMT is highly correlated with LATE_FILING_AMTHigh correlation
SEQUENCE_NO has 2333 (3.7%) missing values Missing
SUBMITTED_ON has 12356 (19.4%) missing values Missing
QEWI_NAME has 13769 (21.7%) missing values Missing
QEWI_BUS_NAME has 14523 (22.8%) missing values Missing
QEWI_BUS_STREET_NAME has 12392 (19.5%) missing values Missing
QEWI_CITY has 12887 (20.3%) missing values Missing
QEWI_STATE has 12395 (19.5%) missing values Missing
QEWI_ZIP has 44195 (69.5%) missing values Missing
QEWI_NYS_LIC_NO has 44173 (69.5%) missing values Missing
OWNER_NAME has 44158 (69.5%) missing values Missing
OWNER_BUS_NAME has 11812 (18.6%) missing values Missing
OWNER_BUS_STREET_NAME has 63578 (100.0%) missing values Missing
OWNER_CITY has 63578 (100.0%) missing values Missing
OWNER_ZIP has 63578 (100.0%) missing values Missing
OWNER_STATE has 63578 (100.0%) missing values Missing
FILING_DATE has 12782 (20.1%) missing values Missing
PRIOR_CYCLE_FILING_DATE has 20508 (32.3%) missing values Missing
PRIOR_STATUS has 17851 (28.1%) missing values Missing
FIELD_INSPECTION_COMPLETED_DATE has 16664 (26.2%) missing values Missing
QEWI_SIGNED_DATE has 17763 (27.9%) missing values Missing
LATE_FILING_AMT has 1385 (2.2%) missing values Missing
FAILURE_TO_FILE_AMT has 1379 (2.2%) missing values Missing
FAILURE_TO_COLLECT_AMT has 1200 (1.9%) missing values Missing
COMMENTS has 45747 (72.0%) missing values Missing
FAILURE_TO_COLLECT_AMT is highly skewed (γ1 = 20.51379133) Skewed
TR6_NO is uniformly distributed Uniform
SEQUENCE_NO is an unsupported type, check if it needs cleaning or further analysis Unsupported
OWNER_BUS_STREET_NAME is an unsupported type, check if it needs cleaning or further analysis Unsupported
OWNER_CITY is an unsupported type, check if it needs cleaning or further analysis Unsupported
OWNER_ZIP is an unsupported type, check if it needs cleaning or further analysis Unsupported
OWNER_STATE is an unsupported type, check if it needs cleaning or further analysis Unsupported
LATE_FILING_AMT has 19300 (30.4%) zeros Zeros
FAILURE_TO_FILE_AMT has 41113 (64.7%) zeros Zeros
FAILURE_TO_COLLECT_AMT has 51488 (81.0%) zeros Zeros

Reproduction

Analysis started2022-06-30 21:04:25.851923
Analysis finished2022-06-30 21:04:50.440029
Duration24.59 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

TR6_NO
Categorical

HIGH CARDINALITY
UNIFORM

Distinct63556
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size496.8 KiB
TR6-814993-8A-N1
 
4
TR6-610070-NA-I1
 
4
TR6-815008-8B-N1
 
4
TR6-812231-8B-N1
 
4
TR6-815008-8B-I2
 
2
Other values (63551)
63560 

Length

Max length17
Median length16
Mean length16.00003146
Min length16

Characters and Unicode

Total characters1017250
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique63542 ?
Unique (%)99.9%

Sample

1st rowTR6-913448-9A-N1
2nd rowTR6-913451-9A-N1
3rd rowTR6-913456-9A-N1
4th rowTR6-913458-9A-N1
5th rowTR6-913460-9A-N1

Common Values

ValueCountFrequency (%)
TR6-814993-8A-N14
 
< 0.1%
TR6-610070-NA-I14
 
< 0.1%
TR6-815008-8B-N14
 
< 0.1%
TR6-812231-8B-N14
 
< 0.1%
TR6-815008-8B-I22
 
< 0.1%
TR6-613144-NA-N12
 
< 0.1%
TR6-601435-NA-I12
 
< 0.1%
TR6-800351-8A-S12
 
< 0.1%
TR6-613144-NA-I12
 
< 0.1%
TR6-613144-NA-A12
 
< 0.1%
Other values (63546)63550
> 99.9%

Length

2022-06-30T21:04:50.559149image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tr6-814993-8a-n14
 
< 0.1%
tr6-815008-8b-n14
 
< 0.1%
tr6-812231-8b-n14
 
< 0.1%
tr6-610070-na-i14
 
< 0.1%
tr6-613144-na-a12
 
< 0.1%
tr6-815013-8b-i12
 
< 0.1%
tr6-613144-na-s12
 
< 0.1%
tr6-812231-8b-i12
 
< 0.1%
tr6-601435-na-n12
 
< 0.1%
tr6-613144-na-i12
 
< 0.1%
Other values (63546)63550
> 99.9%

Most occurring characters

ValueCountFrequency (%)
-190734
18.7%
1109905
10.8%
6103851
10.2%
071191
 
7.0%
867294
 
6.6%
T63578
 
6.2%
R63578
 
6.2%
755220
 
5.4%
A46084
 
4.5%
944317
 
4.4%
Other values (9)201498
19.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number556239
54.7%
Uppercase Letter270277
26.6%
Dash Punctuation190734
 
18.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1109905
19.8%
6103851
18.7%
071191
12.8%
867294
12.1%
755220
9.9%
944317
8.0%
229007
 
5.2%
326478
 
4.8%
425288
 
4.5%
523688
 
4.3%
Uppercase Letter
ValueCountFrequency (%)
T63578
23.5%
R63578
23.5%
A46084
17.1%
I43361
16.0%
N28292
10.5%
B12314
 
4.6%
C11743
 
4.3%
S1327
 
0.5%
Dash Punctuation
ValueCountFrequency (%)
-190734
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common746973
73.4%
Latin270277
 
26.6%

Most frequent character per script

Common
ValueCountFrequency (%)
-190734
25.5%
1109905
14.7%
6103851
13.9%
071191
 
9.5%
867294
 
9.0%
755220
 
7.4%
944317
 
5.9%
229007
 
3.9%
326478
 
3.5%
425288
 
3.4%
Latin
ValueCountFrequency (%)
T63578
23.5%
R63578
23.5%
A46084
17.1%
I43361
16.0%
N28292
10.5%
B12314
 
4.6%
C11743
 
4.3%
S1327
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017250
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
-190734
18.7%
1109905
10.8%
6103851
10.2%
071191
 
7.0%
867294
 
6.6%
T63578
 
6.2%
R63578
 
6.2%
755220
 
5.4%
A46084
 
4.5%
944317
 
4.4%
Other values (9)201498
19.8%

CONTROL_NO
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct49000
Distinct (%)77.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean749000.7584
Minimum600001
Maximum919118
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size496.8 KiB
2022-06-30T21:04:51.153155image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum600001
5-th percentile602634.85
Q1613118.25
median800117.5
Q3811863
95-th percentile911902.15
Maximum919118
Range319117
Interquartile range (IQR)198744.75

Descriptive statistics

Standard deviation104353.8707
Coefficient of variation (CV)0.1393241189
Kurtosis-1.183629215
Mean749000.7584
Median Absolute Deviation (MAD)95285
Skewness-0.006453664325
Sum4.761997022 × 1010
Variance1.088973034 × 1010
MonotonicityNot monotonic
2022-06-30T21:04:51.417594image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6131448
 
< 0.1%
8149016
 
< 0.1%
8122316
 
< 0.1%
8149036
 
< 0.1%
8150086
 
< 0.1%
9160045
 
< 0.1%
8165525
 
< 0.1%
8022245
 
< 0.1%
8019895
 
< 0.1%
8071425
 
< 0.1%
Other values (48990)63521
99.9%
ValueCountFrequency (%)
6000011
< 0.1%
6000031
< 0.1%
6000041
< 0.1%
6000051
< 0.1%
6000061
< 0.1%
6000071
< 0.1%
6000081
< 0.1%
6000091
< 0.1%
6000101
< 0.1%
6000111
< 0.1%
ValueCountFrequency (%)
9191181
< 0.1%
9191161
< 0.1%
9191151
< 0.1%
9191141
< 0.1%
9191131
< 0.1%
9191101
< 0.1%
9191092
< 0.1%
9191081
< 0.1%
9191061
< 0.1%
9191051
< 0.1%

FILING_TYPE
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size496.8 KiB
Initial
43361 
Auto-Generated
12327 
Amended
6563 
Subsequent
 
1327

Length

Max length14
Median length7
Mean length8.419830759
Min length7

Characters and Unicode

Total characters535316
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAuto-Generated
2nd rowAuto-Generated
3rd rowAuto-Generated
4th rowAuto-Generated
5th rowAuto-Generated

Common Values

ValueCountFrequency (%)
Initial43361
68.2%
Auto-Generated12327
 
19.4%
Amended6563
 
10.3%
Subsequent1327
 
2.1%

Length

2022-06-30T21:04:51.722593image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-30T21:04:51.993939image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
initial43361
68.2%
auto-generated12327
 
19.4%
amended6563
 
10.3%
subsequent1327
 
2.1%

Most occurring characters

ValueCountFrequency (%)
i86722
16.2%
t69342
13.0%
n63578
11.9%
a55688
10.4%
e52761
9.9%
I43361
8.1%
l43361
8.1%
d25453
 
4.8%
A18890
 
3.5%
u14981
 
2.8%
Other values (9)61179
11.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter447084
83.5%
Uppercase Letter75905
 
14.2%
Dash Punctuation12327
 
2.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i86722
19.4%
t69342
15.5%
n63578
14.2%
a55688
12.5%
e52761
11.8%
l43361
9.7%
d25453
 
5.7%
u14981
 
3.4%
r12327
 
2.8%
o12327
 
2.8%
Other values (4)10544
 
2.4%
Uppercase Letter
ValueCountFrequency (%)
I43361
57.1%
A18890
24.9%
G12327
 
16.2%
S1327
 
1.7%
Dash Punctuation
ValueCountFrequency (%)
-12327
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin522989
97.7%
Common12327
 
2.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
i86722
16.6%
t69342
13.3%
n63578
12.2%
a55688
10.6%
e52761
10.1%
I43361
8.3%
l43361
8.3%
d25453
 
4.9%
A18890
 
3.6%
u14981
 
2.9%
Other values (8)48852
9.3%
Common
ValueCountFrequency (%)
-12327
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII535316
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i86722
16.2%
t69342
13.0%
n63578
11.9%
a55688
10.4%
e52761
9.9%
I43361
8.1%
l43361
8.1%
d25453
 
4.8%
A18890
 
3.5%
u14981
 
2.8%
Other values (9)61179
11.4%

CYCLE
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size496.8 KiB
8
21555 
6
15965 
7
15662 
9
10396 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters63578
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row9
2nd row9
3rd row9
4th row9
5th row9

Common Values

ValueCountFrequency (%)
821555
33.9%
615965
25.1%
715662
24.6%
910396
16.4%

Length

2022-06-30T21:04:52.119353image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-30T21:04:52.382810image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
821555
33.9%
615965
25.1%
715662
24.6%
910396
16.4%

Most occurring characters

ValueCountFrequency (%)
821555
33.9%
615965
25.1%
715662
24.6%
910396
16.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number63578
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
821555
33.9%
615965
25.1%
715662
24.6%
910396
16.4%

Most occurring scripts

ValueCountFrequency (%)
Common63578
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
821555
33.9%
615965
25.1%
715662
24.6%
910396
16.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII63578
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
821555
33.9%
615965
25.1%
715662
24.6%
910396
16.4%

BIN
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct15680
Distinct (%)24.7%
Missing28
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean1940923.106
Minimum1000000
Maximum5863301
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size496.8 KiB
2022-06-30T21:04:52.523106image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1000000
5-th percentile1007979.05
Q11035344.5
median1081661
Q33110168
95-th percentile4432026
Maximum5863301
Range4863301
Interquartile range (IQR)2074823.5

Descriptive statistics

Standard deviation1219255.774
Coefficient of variation (CV)0.6281834505
Kurtosis-0.5214484333
Mean1940923.106
Median Absolute Deviation (MAD)68100
Skewness0.9819193684
Sum1.233456634 × 1011
Variance1.486584643 × 1012
MonotonicityNot monotonic
2022-06-30T21:04:52.817024image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
108478138
 
0.1%
100000025
 
< 0.1%
333593420
 
< 0.1%
107759119
 
< 0.1%
325390719
 
< 0.1%
107758518
 
< 0.1%
108166117
 
< 0.1%
108830516
 
< 0.1%
334558116
 
< 0.1%
108728416
 
< 0.1%
Other values (15670)63346
99.6%
(Missing)28
 
< 0.1%
ValueCountFrequency (%)
100000025
< 0.1%
10000055
 
< 0.1%
10000066
 
< 0.1%
10000075
 
< 0.1%
10000166
 
< 0.1%
10000184
 
< 0.1%
10000204
 
< 0.1%
10000215
 
< 0.1%
10000235
 
< 0.1%
10000245
 
< 0.1%
ValueCountFrequency (%)
58633011
 
< 0.1%
51600216
< 0.1%
51586793
< 0.1%
51583135
< 0.1%
51575676
< 0.1%
51574024
< 0.1%
51568981
 
< 0.1%
51507681
 
< 0.1%
51419121
 
< 0.1%
51226385
< 0.1%

HOUSE_NO
Categorical

HIGH CARDINALITY

Distinct4189
Distinct (%)6.6%
Missing0
Missing (%)0.0%
Memory size496.8 KiB
1
 
364
30
 
342
50
 
342
40
 
331
200
 
326
Other values (4184)
61873 

Length

Max length7
Median length6
Mean length3.25751046
Min length1

Characters and Unicode

Total characters207106
Distinct characters13
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique251 ?
Unique (%)0.4%

Sample

1st row143-45
2nd row15
3rd row180
4th row41-46
5th row220

Common Values

ValueCountFrequency (%)
1364
 
0.6%
30342
 
0.5%
50342
 
0.5%
40331
 
0.5%
200326
 
0.5%
60320
 
0.5%
100317
 
0.5%
20297
 
0.5%
15280
 
0.4%
150279
 
0.4%
Other values (4179)60380
95.0%

Length

2022-06-30T21:04:53.110281image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1364
 
0.6%
30342
 
0.5%
50342
 
0.5%
40331
 
0.5%
200326
 
0.5%
60320
 
0.5%
100317
 
0.5%
20297
 
0.5%
15280
 
0.4%
150279
 
0.4%
Other values (4178)60384
95.0%

Most occurring characters

ValueCountFrequency (%)
137633
18.2%
028034
13.5%
225441
12.3%
523695
11.4%
320974
10.1%
418270
8.8%
613197
 
6.4%
711531
 
5.6%
810920
 
5.3%
99912
 
4.8%
Other values (3)7499
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number199607
96.4%
Dash Punctuation7491
 
3.6%
Space Separator4
 
< 0.1%
Uppercase Letter4
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
137633
18.9%
028034
14.0%
225441
12.7%
523695
11.9%
320974
10.5%
418270
9.2%
613197
 
6.6%
711531
 
5.8%
810920
 
5.5%
99912
 
5.0%
Dash Punctuation
ValueCountFrequency (%)
-7491
100.0%
Space Separator
ValueCountFrequency (%)
4
100.0%
Uppercase Letter
ValueCountFrequency (%)
A4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common207102
> 99.9%
Latin4
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
137633
18.2%
028034
13.5%
225441
12.3%
523695
11.4%
320974
10.1%
418270
8.8%
613197
 
6.4%
711531
 
5.6%
810920
 
5.3%
99912
 
4.8%
Other values (2)7495
 
3.6%
Latin
ValueCountFrequency (%)
A4
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII207106
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
137633
18.2%
028034
13.5%
225441
12.3%
523695
11.4%
320974
10.1%
418270
8.8%
613197
 
6.4%
711531
 
5.6%
810920
 
5.3%
99912
 
4.8%
Other values (3)7499
 
3.6%

STREET_NAME
Categorical

HIGH CARDINALITY

Distinct2516
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size496.8 KiB
BROADWAY
 
1820
FIFTH AVENUE
 
1247
PARK AVENUE
 
975
MADISON AVENUE
 
815
RIVERSIDE DRIVE
 
661
Other values (2511)
58060 

Length

Max length32
Median length29
Mean length13.6754695
Min length6

Characters and Unicode

Total characters869459
Distinct characters67
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique143 ?
Unique (%)0.2%

Sample

1st rowSANFORD AVENUE
2nd rowOLIVER STREET
3rd rowELDRIDGE STREET
4th row50 STREET
5th rowEAST 19 STREET

Common Values

ValueCountFrequency (%)
BROADWAY1820
 
2.9%
FIFTH AVENUE1247
 
2.0%
PARK AVENUE975
 
1.5%
MADISON AVENUE815
 
1.3%
RIVERSIDE DRIVE661
 
1.0%
WEST END AVENUE641
 
1.0%
LEXINGTON AVENUE543
 
0.9%
THIRD AVENUE481
 
0.8%
SECOND AVENUE392
 
0.6%
7 AVENUE376
 
0.6%
Other values (2506)55627
87.5%

Length

2022-06-30T21:04:53.398275image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
street31448
20.6%
avenue20726
 
13.6%
west12301
 
8.1%
east10657
 
7.0%
park1962
 
1.3%
broadway1944
 
1.3%
boulevard1804
 
1.2%
place1413
 
0.9%
road1382
 
0.9%
drive1296
 
0.8%
Other values (1462)67680
44.3%

Most occurring characters

ValueCountFrequency (%)
E156557
18.0%
T103075
11.9%
92144
10.6%
S69441
 
8.0%
A62986
 
7.2%
R61480
 
7.1%
N43496
 
5.0%
U28797
 
3.3%
V26947
 
3.1%
O25180
 
2.9%
Other values (57)199356
22.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter717355
82.5%
Space Separator92144
 
10.6%
Decimal Number58829
 
6.8%
Lowercase Letter1047
 
0.1%
Other Punctuation55
 
< 0.1%
Dash Punctuation17
 
< 0.1%
Close Punctuation6
 
< 0.1%
Open Punctuation6
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E156557
21.8%
T103075
14.4%
S69441
9.7%
A62986
8.8%
R61480
 
8.6%
N43496
 
6.1%
U28797
 
4.0%
V26947
 
3.8%
O25180
 
3.5%
W19805
 
2.8%
Other values (16)119591
16.7%
Lowercase Letter
ValueCountFrequency (%)
e237
22.6%
t176
16.8%
r97
9.3%
n81
 
7.7%
s78
 
7.4%
a58
 
5.5%
u51
 
4.9%
o42
 
4.0%
v40
 
3.8%
d34
 
3.2%
Other values (13)153
14.6%
Decimal Number
ValueCountFrequency (%)
111743
20.0%
26859
11.7%
36229
10.6%
75893
10.0%
55459
9.3%
45193
8.8%
85075
8.6%
64920
8.4%
93926
 
6.7%
03532
 
6.0%
Other Punctuation
ValueCountFrequency (%)
.26
47.3%
#14
25.5%
'11
20.0%
&4
 
7.3%
Space Separator
ValueCountFrequency (%)
92144
100.0%
Dash Punctuation
ValueCountFrequency (%)
-17
100.0%
Close Punctuation
ValueCountFrequency (%)
)6
100.0%
Open Punctuation
ValueCountFrequency (%)
(6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin718402
82.6%
Common151057
 
17.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
E156557
21.8%
T103075
14.3%
S69441
9.7%
A62986
8.8%
R61480
 
8.6%
N43496
 
6.1%
U28797
 
4.0%
V26947
 
3.8%
O25180
 
3.5%
W19805
 
2.8%
Other values (39)120638
16.8%
Common
ValueCountFrequency (%)
92144
61.0%
111743
 
7.8%
26859
 
4.5%
36229
 
4.1%
75893
 
3.9%
55459
 
3.6%
45193
 
3.4%
85075
 
3.4%
64920
 
3.3%
93926
 
2.6%
Other values (8)3616
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII869459
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E156557
18.0%
T103075
11.9%
92144
10.6%
S69441
 
8.0%
A62986
 
7.2%
R61480
 
7.1%
N43496
 
5.0%
U28797
 
3.3%
V26947
 
3.1%
O25180
 
2.9%
Other values (57)199356
22.9%

BOROUGH
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size496.8 KiB
MANHATTAN
37205 
BROOKLYN
9332 
BRONX
8573 
QUEENS
7851 
STATEN ISLAND
 
617

Length

Max length13
Median length9
Mean length7.982210828
Min length5

Characters and Unicode

Total characters507493
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowQUEENS
2nd rowBROOKLYN
3rd rowMANHATTAN
4th rowQUEENS
5th rowMANHATTAN

Common Values

ValueCountFrequency (%)
MANHATTAN37205
58.5%
BROOKLYN9332
 
14.7%
BRONX8573
 
13.5%
QUEENS7851
 
12.3%
STATEN ISLAND617
 
1.0%

Length

2022-06-30T21:04:53.668045image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-30T21:04:53.918587image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
manhattan37205
58.0%
brooklyn9332
 
14.5%
bronx8573
 
13.4%
queens7851
 
12.2%
staten617
 
1.0%
island617
 
1.0%

Most occurring characters

ValueCountFrequency (%)
A112849
22.2%
N101400
20.0%
T75644
14.9%
M37205
 
7.3%
H37205
 
7.3%
O27237
 
5.4%
B17905
 
3.5%
R17905
 
3.5%
E16319
 
3.2%
L9949
 
2.0%
Other values (9)53875
10.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter506876
99.9%
Space Separator617
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A112849
22.3%
N101400
20.0%
T75644
14.9%
M37205
 
7.3%
H37205
 
7.3%
O27237
 
5.4%
B17905
 
3.5%
R17905
 
3.5%
E16319
 
3.2%
L9949
 
2.0%
Other values (8)53258
10.5%
Space Separator
ValueCountFrequency (%)
617
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin506876
99.9%
Common617
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
A112849
22.3%
N101400
20.0%
T75644
14.9%
M37205
 
7.3%
H37205
 
7.3%
O27237
 
5.4%
B17905
 
3.5%
R17905
 
3.5%
E16319
 
3.2%
L9949
 
2.0%
Other values (8)53258
10.5%
Common
ValueCountFrequency (%)
617
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII507493
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A112849
22.2%
N101400
20.0%
T75644
14.9%
M37205
 
7.3%
H37205
 
7.3%
O27237
 
5.4%
B17905
 
3.5%
R17905
 
3.5%
E16319
 
3.2%
L9949
 
2.0%
Other values (9)53875
10.6%

BLOCK
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3678
Distinct (%)5.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2347.533581
Minimum1
Maximum99999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size496.8 KiB
2022-06-30T21:04:54.051748image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile199
Q1864.25
median1505
Q32830
95-th percentile7242
Maximum99999
Range99998
Interquartile range (IQR)1965.75

Descriptive statistics

Standard deviation2702.322236
Coefficient of variation (CV)1.151132516
Kurtosis219.8598793
Mean2347.533581
Median Absolute Deviation (MAD)710
Skewness7.927977306
Sum149251490
Variance7302545.468
MonotonicityNot monotonic
2022-06-30T21:04:54.320974image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3944241
 
0.4%
16219
 
0.3%
2179187
 
0.3%
4905169
 
0.3%
2180162
 
0.3%
8329156
 
0.2%
4452155
 
0.2%
2139147
 
0.2%
3943145
 
0.2%
1344130
 
0.2%
Other values (3668)61867
97.3%
ValueCountFrequency (%)
139
0.1%
39
 
< 0.1%
45
 
< 0.1%
524
< 0.1%
613
 
< 0.1%
825
< 0.1%
913
 
< 0.1%
1023
< 0.1%
1117
< 0.1%
1324
< 0.1%
ValueCountFrequency (%)
999998
< 0.1%
162347
< 0.1%
162335
< 0.1%
162315
< 0.1%
1623010
< 0.1%
162296
< 0.1%
162281
 
< 0.1%
162273
 
< 0.1%
1622610
< 0.1%
161864
 
< 0.1%

LOT
Real number (ℝ≥0)

Distinct446
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1128.72319
Minimum0
Maximum9100
Zeros3
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size496.8 KiB
2022-06-30T21:04:54.609645image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q17
median31
Q371
95-th percentile7502
Maximum9100
Range9100
Interquartile range (IQR)64

Descriptive statistics

Standard deviation2632.747633
Coefficient of variation (CV)2.332500701
Kurtosis2.061266009
Mean1128.72319
Median Absolute Deviation (MAD)29
Skewness2.011999916
Sum71761963
Variance6931360.099
MonotonicityNot monotonic
2022-06-30T21:04:54.885652image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
112066
 
19.0%
75015021
 
7.9%
75021958
 
3.1%
21103
 
1.7%
10953
 
1.5%
7503950
 
1.5%
20905
 
1.4%
29884
 
1.4%
21856
 
1.3%
15799
 
1.3%
Other values (436)38083
59.9%
ValueCountFrequency (%)
03
 
< 0.1%
112066
19.0%
21103
 
1.7%
3497
 
0.8%
4372
 
0.6%
5717
 
1.1%
6588
 
0.9%
7767
 
1.2%
8611
 
1.0%
9554
 
0.9%
ValueCountFrequency (%)
91003
 
< 0.1%
908013
< 0.1%
90786
 
< 0.1%
90598
 
< 0.1%
90296
 
< 0.1%
90211
 
< 0.1%
90205
 
< 0.1%
90103
 
< 0.1%
90054
 
< 0.1%
900123
< 0.1%

SEQUENCE_NO
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing2333
Missing (%)3.7%
Memory size496.8 KiB

SUBMITTED_ON
Categorical

HIGH CARDINALITY
MISSING

Distinct4461
Distinct (%)8.7%
Missing12356
Missing (%)19.4%
Memory size496.8 KiB
2007-02-21 00:00:00
 
1215
2007-02-20 00:00:00
 
733
2012-02-21 00:00:00
 
642
2017-02-21 00:00:00
 
583
2022-02-21 00:00:00
 
511
Other values (4456)
47538 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters973218
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique441 ?
Unique (%)0.9%

Sample

1st row2012-02-21 00:00:00
2nd row2012-11-07 00:00:00
3rd row2012-03-26 00:00:00
4th row2012-08-20 00:00:00
5th row2011-11-10 00:00:00

Common Values

ValueCountFrequency (%)
2007-02-21 00:00:001215
 
1.9%
2007-02-20 00:00:00733
 
1.2%
2012-02-21 00:00:00642
 
1.0%
2017-02-21 00:00:00583
 
0.9%
2022-02-21 00:00:00511
 
0.8%
2022-02-18 00:00:00446
 
0.7%
2007-02-16 00:00:00408
 
0.6%
2018-02-21 00:00:00405
 
0.6%
2019-02-21 00:00:00367
 
0.6%
2012-08-21 00:00:00351
 
0.6%
Other values (4451)45561
71.7%
(Missing)12356
 
19.4%

Length

2022-06-30T21:04:55.117527image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
00:00:0051055
49.8%
2007-02-211215
 
1.2%
2007-02-20733
 
0.7%
2012-02-21642
 
0.6%
2017-02-21583
 
0.6%
2022-02-21511
 
0.5%
2022-02-18446
 
0.4%
2007-02-16408
 
0.4%
2018-02-21405
 
0.4%
2019-02-21367
 
0.4%
Other values (4453)46079
45.0%

Most occurring characters

ValueCountFrequency (%)
0437086
44.9%
2111981
 
11.5%
-102444
 
10.5%
:102444
 
10.5%
178887
 
8.1%
51222
 
5.3%
718387
 
1.9%
814136
 
1.5%
313435
 
1.4%
612560
 
1.3%
Other values (3)30636
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number717108
73.7%
Dash Punctuation102444
 
10.5%
Other Punctuation102444
 
10.5%
Space Separator51222
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0437086
61.0%
2111981
 
15.6%
178887
 
11.0%
718387
 
2.6%
814136
 
2.0%
313435
 
1.9%
612560
 
1.8%
912324
 
1.7%
510077
 
1.4%
48235
 
1.1%
Dash Punctuation
ValueCountFrequency (%)
-102444
100.0%
Other Punctuation
ValueCountFrequency (%)
:102444
100.0%
Space Separator
ValueCountFrequency (%)
51222
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common973218
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0437086
44.9%
2111981
 
11.5%
-102444
 
10.5%
:102444
 
10.5%
178887
 
8.1%
51222
 
5.3%
718387
 
1.9%
814136
 
1.5%
313435
 
1.4%
612560
 
1.3%
Other values (3)30636
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII973218
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0437086
44.9%
2111981
 
11.5%
-102444
 
10.5%
:102444
 
10.5%
178887
 
8.1%
51222
 
5.3%
718387
 
1.9%
814136
 
1.5%
313435
 
1.4%
612560
 
1.3%
Other values (3)30636
 
3.1%

CURRENT_STATUS
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing271
Missing (%)0.4%
Memory size496.8 KiB
SAFE
29138 
SWARMP
21423 
No Report Filed
7580 
UNSAFE
5166 

Length

Max length15
Median length6
Mean length6.157075837
Min length4

Characters and Unicode

Total characters389786
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo Report Filed
2nd rowUNSAFE
3rd rowNo Report Filed
4th rowNo Report Filed
5th rowSAFE

Common Values

ValueCountFrequency (%)
SAFE29138
45.8%
SWARMP21423
33.7%
No Report Filed7580
 
11.9%
UNSAFE5166
 
8.1%
(Missing)271
 
0.4%

Length

2022-06-30T21:04:55.366495image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-30T21:04:55.584233image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
safe29138
37.1%
swarmp21423
27.3%
no7580
 
9.7%
report7580
 
9.7%
filed7580
 
9.7%
unsafe5166
 
6.6%

Most occurring characters

ValueCountFrequency (%)
S55727
14.3%
A55727
14.3%
F41884
10.7%
E34304
8.8%
R29003
 
7.4%
W21423
 
5.5%
M21423
 
5.5%
P21423
 
5.5%
15160
 
3.9%
e15160
 
3.9%
Other values (9)78552
20.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter298826
76.7%
Lowercase Letter75800
 
19.4%
Space Separator15160
 
3.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S55727
18.6%
A55727
18.6%
F41884
14.0%
E34304
11.5%
R29003
9.7%
W21423
 
7.2%
M21423
 
7.2%
P21423
 
7.2%
N12746
 
4.3%
U5166
 
1.7%
Lowercase Letter
ValueCountFrequency (%)
e15160
20.0%
o15160
20.0%
p7580
10.0%
r7580
10.0%
t7580
10.0%
i7580
10.0%
l7580
10.0%
d7580
10.0%
Space Separator
ValueCountFrequency (%)
15160
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin374626
96.1%
Common15160
 
3.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
S55727
14.9%
A55727
14.9%
F41884
11.2%
E34304
9.2%
R29003
7.7%
W21423
 
5.7%
M21423
 
5.7%
P21423
 
5.7%
e15160
 
4.0%
o15160
 
4.0%
Other values (8)63392
16.9%
Common
ValueCountFrequency (%)
15160
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII389786
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S55727
14.3%
A55727
14.3%
F41884
10.7%
E34304
8.8%
R29003
 
7.4%
W21423
 
5.5%
M21423
 
5.5%
P21423
 
5.5%
15160
 
3.9%
e15160
 
3.9%
Other values (9)78552
20.2%

QEWI_NAME
Categorical

HIGH CARDINALITY
MISSING

Distinct2111
Distinct (%)4.2%
Missing13769
Missing (%)21.7%
Memory size496.8 KiB
PAUL MILLMAN
 
1337
ALAN S EPSTEIN
 
1139
HOWARD L ZIMMERMAN
 
897
TIMOTHY WEBB
 
696
ANTHONY STASIO
 
695
Other values (2106)
45045 

Length

Max length27
Median length24
Mean length14.42695095
Min length1

Characters and Unicode

Total characters718592
Distinct characters55
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique846 ?
Unique (%)1.7%

Sample

1st rowPAUL MILLMAN
2nd rowJAMES CICALO
3rd rowJAMES MODDY
4th rowCHARLES A MERRITT
5th rowSTANFORD CHAN

Common Values

ValueCountFrequency (%)
PAUL MILLMAN1337
 
2.1%
ALAN S EPSTEIN1139
 
1.8%
HOWARD L ZIMMERMAN897
 
1.4%
TIMOTHY WEBB696
 
1.1%
ANTHONY STASIO695
 
1.1%
HOWARD ZIMMERMAN689
 
1.1%
BARIS ACAR665
 
1.0%
CHARLES A MERRITT623
 
1.0%
DAVID SALAMON610
 
1.0%
JOSEPH CANTON563
 
0.9%
Other values (2101)41895
65.9%
(Missing)13769
 
21.7%

Length

2022-06-30T21:04:55.720361image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
j2274
 
2.0%
s2081
 
1.8%
a2015
 
1.8%
alan1796
 
1.6%
l1793
 
1.6%
robert1708
 
1.5%
howard1622
 
1.4%
zimmerman1597
 
1.4%
epstein1575
 
1.4%
paul1522
 
1.3%
Other values (1772)95686
84.2%

Most occurring characters

ValueCountFrequency (%)
97197
13.5%
A76681
 
10.7%
E60423
 
8.4%
N51612
 
7.2%
R49037
 
6.8%
I45841
 
6.4%
L39414
 
5.5%
O38168
 
5.3%
S35377
 
4.9%
M30123
 
4.2%
Other values (45)194719
27.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter620310
86.3%
Space Separator97197
 
13.5%
Other Punctuation717
 
0.1%
Dash Punctuation264
 
< 0.1%
Lowercase Letter64
 
< 0.1%
Decimal Number33
 
< 0.1%
Modifier Symbol7
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A76681
12.4%
E60423
 
9.7%
N51612
 
8.3%
R49037
 
7.9%
I45841
 
7.4%
L39414
 
6.4%
O38168
 
6.2%
S35377
 
5.7%
M30123
 
4.9%
T28739
 
4.6%
Other values (16)164895
26.6%
Lowercase Letter
ValueCountFrequency (%)
i10
15.6%
l8
12.5%
e7
10.9%
a6
9.4%
o5
7.8%
k4
 
6.2%
r4
 
6.2%
m4
 
6.2%
n3
 
4.7%
c3
 
4.7%
Other values (6)10
15.6%
Other Punctuation
ValueCountFrequency (%)
'446
62.2%
.134
 
18.7%
,130
 
18.1%
?4
 
0.6%
:2
 
0.3%
#1
 
0.1%
Decimal Number
ValueCountFrequency (%)
327
81.8%
13
 
9.1%
72
 
6.1%
21
 
3.0%
Space Separator
ValueCountFrequency (%)
97197
100.0%
Dash Punctuation
ValueCountFrequency (%)
-264
100.0%
Modifier Symbol
ValueCountFrequency (%)
`7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin620374
86.3%
Common98218
 
13.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
A76681
12.4%
E60423
 
9.7%
N51612
 
8.3%
R49037
 
7.9%
I45841
 
7.4%
L39414
 
6.4%
O38168
 
6.2%
S35377
 
5.7%
M30123
 
4.9%
T28739
 
4.6%
Other values (32)164959
26.6%
Common
ValueCountFrequency (%)
97197
99.0%
'446
 
0.5%
-264
 
0.3%
.134
 
0.1%
,130
 
0.1%
327
 
< 0.1%
`7
 
< 0.1%
?4
 
< 0.1%
13
 
< 0.1%
:2
 
< 0.1%
Other values (3)4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII718592
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
97197
13.5%
A76681
 
10.7%
E60423
 
8.4%
N51612
 
7.2%
R49037
 
6.8%
I45841
 
6.4%
L39414
 
5.5%
O38168
 
5.3%
S35377
 
4.9%
M30123
 
4.2%
Other values (45)194719
27.1%

QEWI_BUS_NAME
Categorical

HIGH CARDINALITY
MISSING

Distinct4694
Distinct (%)9.6%
Missing14523
Missing (%)22.8%
Memory size496.8 KiB
EPSTEIN ENGINEERING, P.C
 
904
SUPERSTRUCTURES ENG. & ARCH
 
698
RAND ENGINEERING & ARCHITECTURE
 
626
MERRITT ENGINEERING CONSULTANTS
 
603
LAWLESS & MANGIONE, LLP
 
591
Other values (4689)
45633 

Length

Max length43
Median length32
Mean length23.82915095
Min length2

Characters and Unicode

Total characters1168939
Distinct characters74
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2159 ?
Unique (%)4.4%

Sample

1st rowSUPERSTRUCTURES ENG & ARCH
2nd rowFSI ARCHITECTURE, PC
3rd rowHEITMANN & ASSOCIATES, INC
4th rowMERRITT.ENGINEERING CONSULTANTS
5th rowIBA ARCHITECTS, PLLC

Common Values

ValueCountFrequency (%)
EPSTEIN ENGINEERING, P.C904
 
1.4%
SUPERSTRUCTURES ENG. & ARCH698
 
1.1%
RAND ENGINEERING & ARCHITECTURE626
 
1.0%
MERRITT ENGINEERING CONSULTANTS603
 
0.9%
LAWLESS & MANGIONE, LLP591
 
0.9%
HLZIMMERMAN ARCHITECTS567
 
0.9%
GANDHI ENGINEERING INC497
 
0.8%
DEVON ARCHITECTS447
 
0.7%
SALAMON ENGINEERING PLLC444
 
0.7%
RAND ENGINEERING & ARCHITECT411
 
0.6%
Other values (4684)43267
68.1%
(Missing)14523
 
22.8%

Length

2022-06-30T21:04:56.015151image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
engineering14177
 
8.7%
9121
 
5.6%
p.c6007
 
3.7%
architects5766
 
3.5%
pc5613
 
3.4%
architect4013
 
2.5%
architecture3434
 
2.1%
inc3387
 
2.1%
arch2901
 
1.8%
associates2413
 
1.5%
Other values (2685)106433
65.2%

Most occurring characters

ValueCountFrequency (%)
E132080
11.3%
114328
 
9.8%
N111860
 
9.6%
I89283
 
7.6%
R80811
 
6.9%
C80176
 
6.9%
A73568
 
6.3%
T70169
 
6.0%
S58933
 
5.0%
G51908
 
4.4%
Other values (64)305823
26.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter999718
85.5%
Space Separator114328
 
9.8%
Other Punctuation50552
 
4.3%
Lowercase Letter3199
 
0.3%
Math Symbol522
 
< 0.1%
Dash Punctuation427
 
< 0.1%
Decimal Number187
 
< 0.1%
Close Punctuation2
 
< 0.1%
Open Punctuation2
 
< 0.1%
Modifier Symbol2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E132080
13.2%
N111860
11.2%
I89283
8.9%
R80811
 
8.1%
C80176
 
8.0%
A73568
 
7.4%
T70169
 
7.0%
S58933
 
5.9%
G51908
 
5.2%
L41622
 
4.2%
Other values (16)209308
20.9%
Lowercase Letter
ValueCountFrequency (%)
a606
18.9%
m589
18.4%
p585
18.3%
e248
7.8%
n216
 
6.8%
i199
 
6.2%
r169
 
5.3%
g130
 
4.1%
t109
 
3.4%
c108
 
3.4%
Other values (13)240
 
7.5%
Decimal Number
ValueCountFrequency (%)
167
35.8%
242
22.5%
719
 
10.2%
417
 
9.1%
016
 
8.6%
57
 
3.7%
36
 
3.2%
95
 
2.7%
84
 
2.1%
64
 
2.1%
Other Punctuation
ValueCountFrequency (%)
.20031
39.6%
,19010
37.6%
&10226
20.2%
;588
 
1.2%
/370
 
0.7%
'318
 
0.6%
%6
 
< 0.1%
@3
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+521
99.8%
=1
 
0.2%
Space Separator
ValueCountFrequency (%)
114328
100.0%
Dash Punctuation
ValueCountFrequency (%)
-427
100.0%
Close Punctuation
ValueCountFrequency (%)
)2
100.0%
Open Punctuation
ValueCountFrequency (%)
(2
100.0%
Modifier Symbol
ValueCountFrequency (%)
`2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1002917
85.8%
Common166022
 
14.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
E132080
13.2%
N111860
11.2%
I89283
8.9%
R80811
 
8.1%
C80176
 
8.0%
A73568
 
7.3%
T70169
 
7.0%
S58933
 
5.9%
G51908
 
5.2%
L41622
 
4.2%
Other values (39)212507
21.2%
Common
ValueCountFrequency (%)
114328
68.9%
.20031
 
12.1%
,19010
 
11.5%
&10226
 
6.2%
;588
 
0.4%
+521
 
0.3%
-427
 
0.3%
/370
 
0.2%
'318
 
0.2%
167
 
< 0.1%
Other values (15)136
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1168939
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E132080
11.3%
114328
 
9.8%
N111860
 
9.6%
I89283
 
7.6%
R80811
 
6.9%
C80176
 
6.9%
A73568
 
6.3%
T70169
 
6.0%
S58933
 
5.0%
G51908
 
4.4%
Other values (64)305823
26.2%

QEWI_BUS_STREET_NAME
Categorical

HIGH CARDINALITY
MISSING

Distinct3901
Distinct (%)7.6%
Missing12392
Missing (%)19.5%
Memory size496.8 KiB
480 NORTH BROADWAY
 
1848
159 WEST 25TH STREET
 
1023
317 MADISON AVENUE, SUITE 915
 
812
11 WEST 30TH STREET
 
661
11 W 30 ST
 
631
Other values (3896)
46211 

Length

Max length42
Median length38
Mean length20.24436369
Min length4

Characters and Unicode

Total characters1036228
Distinct characters69
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1695 ?
Unique (%)3.3%

Sample

1st row32 AVENUE OF THE AMERICAS
2nd row307 7TH AVENUE, SUITE 1001
3rd row20 WEST 22ND STREET, 17TH FLOOR
4th row28-08 BAYSIDE LANE
5th row232 MADISON AVENUE

Common Values

ValueCountFrequency (%)
480 NORTH BROADWAY1848
 
2.9%
159 WEST 25TH STREET1023
 
1.6%
317 MADISON AVENUE, SUITE 915812
 
1.3%
11 WEST 30TH STREET661
 
1.0%
11 W 30 ST631
 
1.0%
111 JOHN STREET611
 
1.0%
152 MADISON AVENUE584
 
0.9%
159 WEST 25TH STREET, 12TH FLOOR576
 
0.9%
32 AVENUE OF THE AMERICAS525
 
0.8%
28-08 BAYSIDE LANE516
 
0.8%
Other values (3891)43399
68.3%
(Missing)12392
 
19.5%

Length

2022-06-30T21:04:56.312531image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
street19685
 
10.1%
avenue13118
 
6.7%
west12748
 
6.5%
suite4972
 
2.5%
broadway4428
 
2.3%
madison3637
 
1.9%
floor3209
 
1.6%
ave2449
 
1.3%
north2305
 
1.2%
road2287
 
1.2%
Other values (2288)126900
64.8%

Most occurring characters

ValueCountFrequency (%)
162407
15.7%
E113425
 
10.9%
T94290
 
9.1%
S57901
 
5.6%
A53124
 
5.1%
R48909
 
4.7%
141111
 
4.0%
N35419
 
3.4%
O32644
 
3.2%
228965
 
2.8%
Other values (59)368033
35.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter647409
62.5%
Decimal Number209188
 
20.2%
Space Separator162407
 
15.7%
Other Punctuation12496
 
1.2%
Dash Punctuation3823
 
0.4%
Lowercase Letter531
 
0.1%
Open Punctuation182
 
< 0.1%
Close Punctuation176
 
< 0.1%
Modifier Symbol16
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E113425
17.5%
T94290
14.6%
S57901
8.9%
A53124
 
8.2%
R48909
 
7.6%
N35419
 
5.5%
O32644
 
5.0%
H27794
 
4.3%
U24572
 
3.8%
I23125
 
3.6%
Other values (16)136206
21.0%
Lowercase Letter
ValueCountFrequency (%)
t107
20.2%
e80
15.1%
r53
10.0%
h48
9.0%
o45
8.5%
a33
 
6.2%
s26
 
4.9%
u25
 
4.7%
i23
 
4.3%
n21
 
4.0%
Other values (10)70
13.2%
Decimal Number
ValueCountFrequency (%)
141111
19.7%
228965
13.8%
027444
13.1%
325191
12.0%
520793
9.9%
417541
8.4%
814108
 
6.7%
912959
 
6.2%
611157
 
5.3%
79919
 
4.7%
Other Punctuation
ValueCountFrequency (%)
,9393
75.2%
.2143
 
17.1%
#903
 
7.2%
'31
 
0.2%
&16
 
0.1%
/6
 
< 0.1%
;2
 
< 0.1%
@2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
162407
100.0%
Dash Punctuation
ValueCountFrequency (%)
-3823
100.0%
Open Punctuation
ValueCountFrequency (%)
(182
100.0%
Close Punctuation
ValueCountFrequency (%)
)176
100.0%
Modifier Symbol
ValueCountFrequency (%)
`16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin647940
62.5%
Common388288
37.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
E113425
17.5%
T94290
14.6%
S57901
8.9%
A53124
 
8.2%
R48909
 
7.5%
N35419
 
5.5%
O32644
 
5.0%
H27794
 
4.3%
U24572
 
3.8%
I23125
 
3.6%
Other values (36)136737
21.1%
Common
ValueCountFrequency (%)
162407
41.8%
141111
 
10.6%
228965
 
7.5%
027444
 
7.1%
325191
 
6.5%
520793
 
5.4%
417541
 
4.5%
814108
 
3.6%
912959
 
3.3%
611157
 
2.9%
Other values (13)26612
 
6.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII1036228
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
162407
15.7%
E113425
 
10.9%
T94290
 
9.1%
S57901
 
5.6%
A53124
 
5.1%
R48909
 
4.7%
141111
 
4.0%
N35419
 
3.4%
O32644
 
3.2%
228965
 
2.8%
Other values (59)368033
35.5%

QEWI_CITY
Categorical

HIGH CARDINALITY
MISSING

Distinct541
Distinct (%)1.1%
Missing12887
Missing (%)20.3%
Memory size496.8 KiB
NEW YORK
29603 
YONKERS
 
2173
NY
 
1797
BROOKLYN
 
1777
BAYSIDE
 
1046
Other values (536)
14295 

Length

Max length18
Median length8
Mean length8.08244067
Min length2

Characters and Unicode

Total characters409707
Distinct characters56
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique209 ?
Unique (%)0.4%

Sample

1st rowNEW YORK
2nd rowNEW YORK
3rd rowNEW YORK
4th rowBAYSIDE
5th rowNEW YORK

Common Values

ValueCountFrequency (%)
NEW YORK29603
46.6%
YONKERS2173
 
3.4%
NY1797
 
2.8%
BROOKLYN1777
 
2.8%
BAYSIDE1046
 
1.6%
FLUSHING694
 
1.1%
STATEN ISLAND603
 
0.9%
NEW ROCHELLE546
 
0.9%
GREAT NECK475
 
0.7%
LONG ISLAND CITY401
 
0.6%
Other values (531)11576
 
18.2%
(Missing)12887
20.3%

Length

2022-06-30T21:04:56.616017image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
new30502
35.2%
york29747
34.3%
yonkers2178
 
2.5%
ny1805
 
2.1%
brooklyn1779
 
2.1%
island1155
 
1.3%
bayside1047
 
1.2%
city872
 
1.0%
flushing694
 
0.8%
staten606
 
0.7%
Other values (511)16274
18.8%

Most occurring characters

ValueCountFrequency (%)
N45747
11.2%
E45745
11.2%
O44636
10.9%
R40948
10.0%
Y39788
9.7%
35992
8.8%
K35691
8.7%
W32773
8.0%
A11444
 
2.8%
S11443
 
2.8%
Other values (46)65500
16.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter372891
91.0%
Space Separator35992
 
8.8%
Other Punctuation502
 
0.1%
Lowercase Letter279
 
0.1%
Decimal Number42
 
< 0.1%
Dash Punctuation1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N45747
12.3%
E45745
12.3%
O44636
12.0%
R40948
11.0%
Y39788
10.7%
K35691
9.6%
W32773
8.8%
A11444
 
3.1%
S11443
 
3.1%
L10860
 
2.9%
Other values (16)53816
14.4%
Lowercase Letter
ValueCountFrequency (%)
o51
18.3%
e48
17.2%
r41
14.7%
k38
13.6%
w37
13.3%
t9
 
3.2%
l9
 
3.2%
n9
 
3.2%
y7
 
2.5%
s6
 
2.2%
Other values (10)24
8.6%
Decimal Number
ValueCountFrequency (%)
226
61.9%
513
31.0%
01
 
2.4%
31
 
2.4%
81
 
2.4%
Other Punctuation
ValueCountFrequency (%)
.481
95.8%
,20
 
4.0%
'1
 
0.2%
Space Separator
ValueCountFrequency (%)
35992
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin373170
91.1%
Common36537
 
8.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
N45747
12.3%
E45745
12.3%
O44636
12.0%
R40948
11.0%
Y39788
10.7%
K35691
9.6%
W32773
8.8%
A11444
 
3.1%
S11443
 
3.1%
L10860
 
2.9%
Other values (36)54095
14.5%
Common
ValueCountFrequency (%)
35992
98.5%
.481
 
1.3%
226
 
0.1%
,20
 
0.1%
513
 
< 0.1%
01
 
< 0.1%
31
 
< 0.1%
81
 
< 0.1%
-1
 
< 0.1%
'1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII409707
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N45747
11.2%
E45745
11.2%
O44636
10.9%
R40948
10.0%
Y39788
9.7%
35992
8.8%
K35691
8.7%
W32773
8.0%
A11444
 
2.8%
S11443
 
2.8%
Other values (46)65500
16.0%

QEWI_STATE
Categorical

MISSING

Distinct16
Distinct (%)< 0.1%
Missing12395
Missing (%)19.5%
Memory size496.8 KiB
NY
46964 
NJ
 
3548
CT
 
419
N.
 
112
VA
 
41
Other values (11)
 
99

Length

Max length7
Median length2
Mean length2.000195377
Min length2

Characters and Unicode

Total characters102376
Distinct characters24
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowNY
2nd rowNY
3rd rowNY
4th rowNY
5th rowNY

Common Values

ValueCountFrequency (%)
NY46964
73.9%
NJ3548
 
5.6%
CT419
 
0.7%
N.112
 
0.2%
VA41
 
0.1%
MD25
 
< 0.1%
FL24
 
< 0.1%
PA18
 
< 0.1%
IL14
 
< 0.1%
NE6
 
< 0.1%
Other values (6)12
 
< 0.1%
(Missing)12395
 
19.5%

Length

2022-06-30T21:04:56.894405image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ny46964
91.8%
nj3548
 
6.9%
ct419
 
0.8%
n112
 
0.2%
va41
 
0.1%
md25
 
< 0.1%
fl24
 
< 0.1%
pa18
 
< 0.1%
il14
 
< 0.1%
ne6
 
< 0.1%
Other values (6)12
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N50635
49.5%
Y46964
45.9%
J3548
 
3.5%
C422
 
0.4%
T420
 
0.4%
.112
 
0.1%
A60
 
0.1%
V41
 
< 0.1%
L38
 
< 0.1%
D32
 
< 0.1%
Other values (14)104
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter102252
99.9%
Other Punctuation112
 
0.1%
Lowercase Letter12
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N50635
49.5%
Y46964
45.9%
J3548
 
3.5%
C422
 
0.4%
T420
 
0.4%
A60
 
0.1%
V41
 
< 0.1%
L38
 
< 0.1%
D32
 
< 0.1%
F26
 
< 0.1%
Other values (7)66
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
l2
16.7%
o2
16.7%
r2
16.7%
i2
16.7%
d2
16.7%
a2
16.7%
Other Punctuation
ValueCountFrequency (%)
.112
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin102264
99.9%
Common112
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
N50635
49.5%
Y46964
45.9%
J3548
 
3.5%
C422
 
0.4%
T420
 
0.4%
A60
 
0.1%
V41
 
< 0.1%
L38
 
< 0.1%
D32
 
< 0.1%
F26
 
< 0.1%
Other values (13)78
 
0.1%
Common
ValueCountFrequency (%)
.112
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII102376
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N50635
49.5%
Y46964
45.9%
J3548
 
3.5%
C422
 
0.4%
T420
 
0.4%
.112
 
0.1%
A60
 
0.1%
V41
 
< 0.1%
L38
 
< 0.1%
D32
 
< 0.1%
Other values (14)104
 
0.1%

QEWI_ZIP
Categorical

HIGH CARDINALITY
MISSING

Distinct233
Distinct (%)1.2%
Missing44195
Missing (%)69.5%
Memory size496.8 KiB
10001
3644 
10018
2526 
10016
 
1032
10013
 
795
10701
 
734
Other values (228)
10652 

Length

Max length10
Median length5
Mean length5.026466491
Min length2

Characters and Unicode

Total characters97428
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique26 ?
Unique (%)0.1%

Sample

1st rowNY
2nd row07666
3rd rowNY
4th row10013
5th row11563

Common Values

ValueCountFrequency (%)
100013644
 
5.7%
100182526
 
4.0%
100161032
 
1.6%
10013795
 
1.3%
10701734
 
1.2%
10011613
 
1.0%
10010590
 
0.9%
11358463
 
0.7%
10025403
 
0.6%
10017391
 
0.6%
Other values (223)8192
 
12.9%
(Missing)44195
69.5%

Length

2022-06-30T21:04:57.145013image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
100013644
18.8%
100182526
 
13.0%
100161032
 
5.3%
10013795
 
4.1%
10701734
 
3.8%
10011613
 
3.2%
10010590
 
3.0%
11358463
 
2.4%
10025403
 
2.1%
10017391
 
2.0%
Other values (223)8192
42.3%

Most occurring characters

ValueCountFrequency (%)
036594
37.6%
134692
35.6%
74801
 
4.9%
84515
 
4.6%
34390
 
4.5%
23797
 
3.9%
53416
 
3.5%
63085
 
3.2%
41108
 
1.1%
9940
 
1.0%
Other values (3)90
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number97338
99.9%
Dash Punctuation86
 
0.1%
Uppercase Letter4
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
036594
37.6%
134692
35.6%
74801
 
4.9%
84515
 
4.6%
34390
 
4.5%
23797
 
3.9%
53416
 
3.5%
63085
 
3.2%
41108
 
1.1%
9940
 
1.0%
Uppercase Letter
ValueCountFrequency (%)
N2
50.0%
Y2
50.0%
Dash Punctuation
ValueCountFrequency (%)
-86
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common97424
> 99.9%
Latin4
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
036594
37.6%
134692
35.6%
74801
 
4.9%
84515
 
4.6%
34390
 
4.5%
23797
 
3.9%
53416
 
3.5%
63085
 
3.2%
41108
 
1.1%
9940
 
1.0%
Latin
ValueCountFrequency (%)
N2
50.0%
Y2
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII97428
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
036594
37.6%
134692
35.6%
74801
 
4.9%
84515
 
4.6%
34390
 
4.5%
23797
 
3.9%
53416
 
3.5%
63085
 
3.2%
41108
 
1.1%
9940
 
1.0%
Other values (3)90
 
0.1%

QEWI_NYS_LIC_NO
Categorical

HIGH CARDINALITY
MISSING

Distinct471
Distinct (%)2.4%
Missing44173
Missing (%)69.5%
Memory size496.8 KiB
RA - 014327
 
632
PE - 088730
 
419
PE - 088575
 
393
RA - 031877
 
366
PE - 058384
 
344
Other values (466)
17251 

Length

Max length11
Median length11
Mean length10.9996908
Min length10

Characters and Unicode

Total characters213449
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique63 ?
Unique (%)0.3%

Sample

1st rowPE - 079953
2nd rowPE - 079953
3rd rowPE - 079953
4th rowL - 611208
5th rowRA - 039518

Common Values

ValueCountFrequency (%)
RA - 014327632
 
1.0%
PE - 088730419
 
0.7%
PE - 088575393
 
0.6%
RA - 031877366
 
0.6%
PE - 058384344
 
0.5%
PE - 067415341
 
0.5%
PE - 048838338
 
0.5%
RA - 017183325
 
0.5%
PE - 088950321
 
0.5%
PE - 084457309
 
0.5%
Other values (461)15617
 
24.6%
(Missing)44173
69.5%

Length

2022-06-30T21:04:57.412455image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
19405
33.3%
pe10747
18.5%
ra8652
14.9%
014327632
 
1.1%
088730419
 
0.7%
088575393
 
0.7%
031877366
 
0.6%
058384344
 
0.6%
067415341
 
0.6%
048838338
 
0.6%
Other values (466)16578
28.5%

Most occurring characters

ValueCountFrequency (%)
38810
18.2%
027442
12.9%
-19405
 
9.1%
812375
 
5.8%
711253
 
5.3%
310765
 
5.0%
P10747
 
5.0%
E10747
 
5.0%
110034
 
4.7%
410005
 
4.7%
Other values (8)51866
24.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number116430
54.5%
Space Separator38810
 
18.2%
Uppercase Letter38804
 
18.2%
Dash Punctuation19405
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
027442
23.6%
812375
10.6%
711253
9.7%
310765
 
9.2%
110034
 
8.6%
410005
 
8.6%
29480
 
8.1%
58683
 
7.5%
98599
 
7.4%
67794
 
6.7%
Uppercase Letter
ValueCountFrequency (%)
P10747
27.7%
E10747
27.7%
R8653
22.3%
A8652
22.3%
X4
 
< 0.1%
L1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
38810
100.0%
Dash Punctuation
ValueCountFrequency (%)
-19405
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common174645
81.8%
Latin38804
 
18.2%

Most frequent character per script

Common
ValueCountFrequency (%)
38810
22.2%
027442
15.7%
-19405
11.1%
812375
 
7.1%
711253
 
6.4%
310765
 
6.2%
110034
 
5.7%
410005
 
5.7%
29480
 
5.4%
58683
 
5.0%
Other values (2)16393
9.4%
Latin
ValueCountFrequency (%)
P10747
27.7%
E10747
27.7%
R8653
22.3%
A8652
22.3%
X4
 
< 0.1%
L1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII213449
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
38810
18.2%
027442
12.9%
-19405
 
9.1%
812375
 
5.8%
711253
 
5.3%
310765
 
5.0%
P10747
 
5.0%
E10747
 
5.0%
110034
 
4.7%
410005
 
4.7%
Other values (8)51866
24.3%

OWNER_NAME
Categorical

HIGH CARDINALITY
MISSING

Distinct6058
Distinct (%)31.2%
Missing44158
Missing (%)69.5%
Memory size496.8 KiB
LLOYD VALDEZ
 
734
GARY GUILLAUME
 
539
RICHARD MORRISON
 
209
MARTHA BRAZOBAN
 
202
EDWARD MCARTHUR
 
172
Other values (6053)
17564 

Length

Max length51
Median length39
Mean length14.58470649
Min length6

Characters and Unicode

Total characters283235
Distinct characters70
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3271 ?
Unique (%)16.8%

Sample

1st rowGOLDSTEIN STUART
2nd rowJEFF FARKAS
3rd rowETEM BIZATI
4th rowNUTTUALL ELEANOR
5th rowGAZIVODA ANTHONY

Common Values

ValueCountFrequency (%)
LLOYD VALDEZ734
 
1.2%
GARY GUILLAUME539
 
0.8%
RICHARD MORRISON209
 
0.3%
MARTHA BRAZOBAN202
 
0.3%
EDWARD MCARTHUR172
 
0.3%
MICHAEL WOLFE113
 
0.2%
MARY FRANCES SHAUGHNESSY108
 
0.2%
PHILLIP WISCHERTH100
 
0.2%
JUAN R. TORRES97
 
0.2%
FIRSTSERVICE RESIDENTIAL83
 
0.1%
Other values (6048)17063
 
26.8%
(Missing)44158
69.5%

Length

2022-06-30T21:04:57.697685image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
lloyd781
 
2.0%
valdez736
 
1.8%
michael682
 
1.7%
gary576
 
1.4%
guillaume542
 
1.4%
richard486
 
1.2%
david465
 
1.2%
john404
 
1.0%
joseph294
 
0.7%
robert288
 
0.7%
Other values (6205)34590
86.8%

Most occurring characters

ValueCountFrequency (%)
39569
14.0%
A28804
 
10.2%
E24719
 
8.7%
R20756
 
7.3%
N17780
 
6.3%
I16515
 
5.8%
L15745
 
5.6%
O14796
 
5.2%
S12930
 
4.6%
T9780
 
3.5%
Other values (60)81841
28.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter242283
85.5%
Space Separator39569
 
14.0%
Decimal Number624
 
0.2%
Other Punctuation407
 
0.1%
Lowercase Letter270
 
0.1%
Dash Punctuation75
 
< 0.1%
Open Punctuation3
 
< 0.1%
Close Punctuation3
 
< 0.1%
Modifier Symbol1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A28804
11.9%
E24719
 
10.2%
R20756
 
8.6%
N17780
 
7.3%
I16515
 
6.8%
L15745
 
6.5%
O14796
 
6.1%
S12930
 
5.3%
T9780
 
4.0%
M9548
 
3.9%
Other values (16)70910
29.3%
Lowercase Letter
ValueCountFrequency (%)
o47
17.4%
r36
13.3%
k30
11.1%
e29
10.7%
w20
7.4%
n18
 
6.7%
a17
 
6.3%
l13
 
4.8%
y13
 
4.8%
t7
 
2.6%
Other values (12)40
14.8%
Decimal Number
ValueCountFrequency (%)
181
13.0%
281
13.0%
069
11.1%
463
10.1%
660
9.6%
858
9.3%
755
8.8%
354
8.7%
952
8.3%
551
8.2%
Other Punctuation
ValueCountFrequency (%)
.300
73.7%
'53
 
13.0%
,21
 
5.2%
/16
 
3.9%
?11
 
2.7%
&3
 
0.7%
;3
 
0.7%
Space Separator
ValueCountFrequency (%)
39569
100.0%
Dash Punctuation
ValueCountFrequency (%)
-75
100.0%
Open Punctuation
ValueCountFrequency (%)
(3
100.0%
Close Punctuation
ValueCountFrequency (%)
)3
100.0%
Modifier Symbol
ValueCountFrequency (%)
`1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin242553
85.6%
Common40682
 
14.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
A28804
11.9%
E24719
 
10.2%
R20756
 
8.6%
N17780
 
7.3%
I16515
 
6.8%
L15745
 
6.5%
O14796
 
6.1%
S12930
 
5.3%
T9780
 
4.0%
M9548
 
3.9%
Other values (38)71180
29.3%
Common
ValueCountFrequency (%)
39569
97.3%
.300
 
0.7%
181
 
0.2%
281
 
0.2%
-75
 
0.2%
069
 
0.2%
463
 
0.2%
660
 
0.1%
858
 
0.1%
755
 
0.1%
Other values (12)271
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII283235
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
39569
14.0%
A28804
 
10.2%
E24719
 
8.7%
R20756
 
7.3%
N17780
 
6.3%
I16515
 
5.8%
L15745
 
5.6%
O14796
 
5.2%
S12930
 
4.6%
T9780
 
3.5%
Other values (60)81841
28.9%

OWNER_BUS_NAME
Categorical

HIGH CARDINALITY
MISSING

Distinct26232
Distinct (%)50.7%
Missing11812
Missing (%)18.6%
Memory size496.8 KiB
N.Y.C.H.A.
 
2262
NEW YORK CITY HOUSING AUTHORITY
 
2119
NYCHA
 
1600
PR
 
1160
NYC HOUSING AUTHORITY
 
712
Other values (26227)
43913 

Length

Max length100
Median length80
Mean length21.17523085
Min length1

Characters and Unicode

Total characters1096157
Distinct characters85
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18083 ?
Unique (%)34.9%

Sample

1st rowBROOKFIELD PROPERTIES
2nd row62 COOPER SQUARE CONDOMINIUM
3rd rowONE STATE STREET, LLC
4th rowVERIZON NEW YORK, INC
5th rowCUSHMAN & WAKEFIELD INC

Common Values

ValueCountFrequency (%)
N.Y.C.H.A.2262
 
3.6%
NEW YORK CITY HOUSING AUTHORITY2119
 
3.3%
NYCHA1600
 
2.5%
PR1160
 
1.8%
NYC HOUSING AUTHORITY712
 
1.1%
COLUMBIA UNIVERSITY296
 
0.5%
NEW YORK UNIVERSITY188
 
0.3%
N.Y.C.H.A125
 
0.2%
PARKCHESTER NORTH CONDOMINIUM103
 
0.2%
PARKCHESTER SOUTH CONDOMINIUM, INC.97
 
0.2%
Other values (26222)43104
67.8%
(Missing)11812
 
18.6%

Length

2022-06-30T21:04:57.982761image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
llc9196
 
5.2%
corp8192
 
4.6%
inc4276
 
2.4%
owners3820
 
2.2%
housing3819
 
2.2%
realty3798
 
2.1%
street3452
 
1.9%
new2937
 
1.7%
authority2900
 
1.6%
york2806
 
1.6%
Other values (11022)132324
74.5%

Most occurring characters

ValueCountFrequency (%)
125931
 
11.5%
E73560
 
6.7%
O60938
 
5.6%
R59903
 
5.5%
T59813
 
5.5%
A59217
 
5.4%
N58097
 
5.3%
C54491
 
5.0%
S49473
 
4.5%
I44528
 
4.1%
Other values (75)450206
41.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter761546
69.5%
Space Separator125931
 
11.5%
Lowercase Letter118681
 
10.8%
Decimal Number57518
 
5.2%
Other Punctuation30122
 
2.7%
Dash Punctuation2197
 
0.2%
Open Punctuation60
 
< 0.1%
Close Punctuation55
 
< 0.1%
Math Symbol33
 
< 0.1%
Other Number11
 
< 0.1%
Other values (2)3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e15576
13.1%
o11185
9.4%
n11146
9.4%
t10992
9.3%
r10672
9.0%
a9436
 
8.0%
s7868
 
6.6%
i7708
 
6.5%
l4209
 
3.5%
m4070
 
3.4%
Other values (17)25819
21.8%
Uppercase Letter
ValueCountFrequency (%)
E73560
 
9.7%
O60938
 
8.0%
R59903
 
7.9%
T59813
 
7.9%
A59217
 
7.8%
N58097
 
7.6%
C54491
 
7.2%
S49473
 
6.5%
I44528
 
5.8%
L43456
 
5.7%
Other values (16)198070
26.0%
Other Punctuation
ValueCountFrequency (%)
.21375
71.0%
,5619
 
18.7%
/1410
 
4.7%
&1021
 
3.4%
'404
 
1.3%
;135
 
0.4%
#114
 
0.4%
@19
 
0.1%
¿11
 
< 0.1%
%7
 
< 0.1%
Other values (3)7
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
110312
17.9%
27276
12.6%
06892
12.0%
56802
11.8%
35923
10.3%
44928
8.6%
74363
7.6%
64088
 
7.1%
83671
 
6.4%
93263
 
5.7%
Math Symbol
ValueCountFrequency (%)
+32
97.0%
<1
 
3.0%
Space Separator
ValueCountFrequency (%)
125931
100.0%
Dash Punctuation
ValueCountFrequency (%)
-2197
100.0%
Open Punctuation
ValueCountFrequency (%)
(60
100.0%
Close Punctuation
ValueCountFrequency (%)
)55
100.0%
Other Number
ValueCountFrequency (%)
½11
100.0%
Modifier Symbol
ValueCountFrequency (%)
`2
100.0%
Connector Punctuation
ValueCountFrequency (%)
_1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin880227
80.3%
Common215930
 
19.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
E73560
 
8.4%
O60938
 
6.9%
R59903
 
6.8%
T59813
 
6.8%
A59217
 
6.7%
N58097
 
6.6%
C54491
 
6.2%
S49473
 
5.6%
I44528
 
5.1%
L43456
 
4.9%
Other values (43)316751
36.0%
Common
ValueCountFrequency (%)
125931
58.3%
.21375
 
9.9%
110312
 
4.8%
27276
 
3.4%
06892
 
3.2%
56802
 
3.2%
35923
 
2.7%
,5619
 
2.6%
44928
 
2.3%
74363
 
2.0%
Other values (22)16509
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1096124
> 99.9%
None33
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
125931
 
11.5%
E73560
 
6.7%
O60938
 
5.6%
R59903
 
5.5%
T59813
 
5.5%
A59217
 
5.4%
N58097
 
5.3%
C54491
 
5.0%
S49473
 
4.5%
I44528
 
4.1%
Other values (72)450173
41.1%
None
ValueCountFrequency (%)
ï11
33.3%
¿11
33.3%
½11
33.3%

OWNER_BUS_STREET_NAME
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing63578
Missing (%)100.0%
Memory size496.8 KiB

OWNER_CITY
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing63578
Missing (%)100.0%
Memory size496.8 KiB

OWNER_ZIP
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing63578
Missing (%)100.0%
Memory size496.8 KiB

OWNER_STATE
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing63578
Missing (%)100.0%
Memory size496.8 KiB

FILING_DATE
Categorical

HIGH CARDINALITY
MISSING

Distinct4464
Distinct (%)8.8%
Missing12782
Missing (%)20.1%
Memory size496.8 KiB
02/21/2007 12:00:00 AM
 
1215
02/20/2007 12:00:00 AM
 
733
02/21/2012 12:00:00 AM
 
642
02/21/2022 12:00:00 AM
 
501
02/21/2017 12:00:00 AM
 
464
Other values (4459)
47241 

Length

Max length22
Median length22
Mean length22
Min length22

Characters and Unicode

Total characters1117512
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique440 ?
Unique (%)0.9%

Sample

1st row02/21/2012 12:00:00 AM
2nd row11/07/2012 12:00:00 AM
3rd row03/26/2012 12:00:00 AM
4th row08/20/2012 12:00:00 AM
5th row11/10/2011 12:00:00 AM

Common Values

ValueCountFrequency (%)
02/21/2007 12:00:00 AM1215
 
1.9%
02/20/2007 12:00:00 AM733
 
1.2%
02/21/2012 12:00:00 AM642
 
1.0%
02/21/2022 12:00:00 AM501
 
0.8%
02/21/2017 12:00:00 AM464
 
0.7%
02/18/2022 12:00:00 AM445
 
0.7%
02/16/2007 12:00:00 AM408
 
0.6%
08/21/2012 12:00:00 AM351
 
0.6%
02/21/2013 12:00:00 AM350
 
0.6%
08/20/2012 12:00:00 AM347
 
0.5%
Other values (4454)45340
71.3%
(Missing)12782
 
20.1%

Length

2022-06-30T21:04:58.227597image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
am50789
33.3%
12:00:0050626
33.2%
02/21/20071215
 
0.8%
02/20/2007733
 
0.5%
02/21/2012642
 
0.4%
02/21/2022501
 
0.3%
02/21/2017464
 
0.3%
02/18/2022445
 
0.3%
02/16/2007408
 
0.3%
08/21/2012351
 
0.2%
Other values (4454)46214
30.3%

Most occurring characters

ValueCountFrequency (%)
0332244
29.7%
2161645
14.5%
1128676
 
11.5%
/101592
 
9.1%
101592
 
9.1%
:101592
 
9.1%
M50796
 
4.5%
A50789
 
4.5%
718081
 
1.6%
814047
 
1.3%
Other values (6)56458
 
5.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number711144
63.6%
Other Punctuation203184
 
18.2%
Space Separator101592
 
9.1%
Uppercase Letter101592
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0332244
46.7%
2161645
22.7%
1128676
 
18.1%
718081
 
2.5%
814047
 
2.0%
313383
 
1.9%
612495
 
1.8%
912238
 
1.7%
510054
 
1.4%
48281
 
1.2%
Uppercase Letter
ValueCountFrequency (%)
M50796
50.0%
A50789
50.0%
P7
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/101592
50.0%
:101592
50.0%
Space Separator
ValueCountFrequency (%)
101592
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1015920
90.9%
Latin101592
 
9.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0332244
32.7%
2161645
15.9%
1128676
 
12.7%
/101592
 
10.0%
101592
 
10.0%
:101592
 
10.0%
718081
 
1.8%
814047
 
1.4%
313383
 
1.3%
612495
 
1.2%
Other values (3)30573
 
3.0%
Latin
ValueCountFrequency (%)
M50796
50.0%
A50789
50.0%
P7
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1117512
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0332244
29.7%
2161645
14.5%
1128676
 
11.5%
/101592
 
9.1%
101592
 
9.1%
:101592
 
9.1%
M50796
 
4.5%
A50789
 
4.5%
718081
 
1.6%
814047
 
1.3%
Other values (6)56458
 
5.1%

FILING_STATUS
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size496.8 KiB
SAFE
21321 
SWARMP
19163 
No Report Filed
12779 
UNSAFE
10315 

Length

Max length15
Median length6
Mean length7.1382711
Min length4

Characters and Unicode

Total characters453837
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo Report Filed
2nd rowNo Report Filed
3rd rowNo Report Filed
4th rowNo Report Filed
5th rowNo Report Filed

Common Values

ValueCountFrequency (%)
SAFE21321
33.5%
SWARMP19163
30.1%
No Report Filed12779
20.1%
UNSAFE10315
16.2%

Length

2022-06-30T21:04:58.464130image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-30T21:04:58.690187image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
safe21321
23.9%
swarmp19163
21.5%
no12779
14.3%
report12779
14.3%
filed12779
14.3%
unsafe10315
11.6%

Most occurring characters

ValueCountFrequency (%)
S50799
11.2%
A50799
11.2%
F44415
 
9.8%
R31942
 
7.0%
E31636
 
7.0%
e25558
 
5.6%
25558
 
5.6%
o25558
 
5.6%
N23094
 
5.1%
P19163
 
4.2%
Other values (9)125315
27.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter300489
66.2%
Lowercase Letter127790
28.2%
Space Separator25558
 
5.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S50799
16.9%
A50799
16.9%
F44415
14.8%
R31942
10.6%
E31636
10.5%
N23094
7.7%
P19163
 
6.4%
M19163
 
6.4%
W19163
 
6.4%
U10315
 
3.4%
Lowercase Letter
ValueCountFrequency (%)
e25558
20.0%
o25558
20.0%
p12779
10.0%
r12779
10.0%
t12779
10.0%
i12779
10.0%
l12779
10.0%
d12779
10.0%
Space Separator
ValueCountFrequency (%)
25558
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin428279
94.4%
Common25558
 
5.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
S50799
11.9%
A50799
11.9%
F44415
10.4%
R31942
 
7.5%
E31636
 
7.4%
e25558
 
6.0%
o25558
 
6.0%
N23094
 
5.4%
P19163
 
4.5%
M19163
 
4.5%
Other values (8)106152
24.8%
Common
ValueCountFrequency (%)
25558
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII453837
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S50799
11.2%
A50799
11.2%
F44415
 
9.8%
R31942
 
7.0%
E31636
 
7.0%
e25558
 
5.6%
25558
 
5.6%
o25558
 
5.6%
N23094
 
5.1%
P19163
 
4.2%
Other values (9)125315
27.6%

PRIOR_CYCLE_FILING_DATE
Categorical

HIGH CARDINALITY
MISSING

Distinct5274
Distinct (%)12.2%
Missing20508
Missing (%)32.3%
Memory size496.8 KiB
02/21/2007 12:00:00 AM
 
1288
02/21/2002 12:00:00 AM
 
982
02/20/2007 12:00:00 AM
 
768
03/01/2000 12:00:00 AM
 
664
02/29/2000 12:00:00 AM
 
551
Other values (5269)
38817 

Length

Max length22
Median length22
Mean length22
Min length22

Characters and Unicode

Total characters947540
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1250 ?
Unique (%)2.9%

Sample

1st row12/08/2006 12:00:00 AM
2nd row02/20/2007 12:00:00 AM
3rd row10/22/2007 12:00:00 AM
4th row01/17/2007 12:00:00 AM
5th row11/09/2006 12:00:00 AM

Common Values

ValueCountFrequency (%)
02/21/2007 12:00:00 AM1288
 
2.0%
02/21/2002 12:00:00 AM982
 
1.5%
02/20/2007 12:00:00 AM768
 
1.2%
03/01/2000 12:00:00 AM664
 
1.0%
02/29/2000 12:00:00 AM551
 
0.9%
02/21/2012 12:00:00 AM451
 
0.7%
02/16/2007 12:00:00 AM420
 
0.7%
02/21/2017 12:00:00 AM389
 
0.6%
02/28/2000 12:00:00 AM332
 
0.5%
02/15/2007 12:00:00 AM317
 
0.5%
Other values (5264)36908
58.1%
(Missing)20508
32.3%

Length

2022-06-30T21:04:58.802832image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
am43064
33.3%
12:00:0041879
32.4%
02/21/20071288
 
1.0%
01:00:001185
 
0.9%
02/21/2002982
 
0.8%
02/20/2007768
 
0.6%
03/01/2000664
 
0.5%
02/29/2000551
 
0.4%
02/21/2012451
 
0.3%
02/16/2007420
 
0.3%
Other values (5267)37958
29.4%

Most occurring characters

ValueCountFrequency (%)
0297624
31.4%
2131018
13.8%
199853
 
10.5%
/86140
 
9.1%
86140
 
9.1%
:86140
 
9.1%
M43070
 
4.5%
A43064
 
4.5%
715196
 
1.6%
314153
 
1.5%
Other values (6)45142
 
4.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number602980
63.6%
Other Punctuation172280
 
18.2%
Space Separator86140
 
9.1%
Uppercase Letter86140
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0297624
49.4%
2131018
21.7%
199853
 
16.6%
715196
 
2.5%
314153
 
2.3%
911073
 
1.8%
89686
 
1.6%
69192
 
1.5%
47801
 
1.3%
57384
 
1.2%
Uppercase Letter
ValueCountFrequency (%)
M43070
50.0%
A43064
50.0%
P6
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/86140
50.0%
:86140
50.0%
Space Separator
ValueCountFrequency (%)
86140
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common861400
90.9%
Latin86140
 
9.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0297624
34.6%
2131018
15.2%
199853
 
11.6%
/86140
 
10.0%
86140
 
10.0%
:86140
 
10.0%
715196
 
1.8%
314153
 
1.6%
911073
 
1.3%
89686
 
1.1%
Other values (3)24377
 
2.8%
Latin
ValueCountFrequency (%)
M43070
50.0%
A43064
50.0%
P6
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII947540
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0297624
31.4%
2131018
13.8%
199853
 
10.5%
/86140
 
9.1%
86140
 
9.1%
:86140
 
9.1%
M43070
 
4.5%
A43064
 
4.5%
715196
 
1.6%
314153
 
1.5%
Other values (6)45142
 
4.8%

PRIOR_STATUS
Categorical

HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing17851
Missing (%)28.1%
Memory size496.8 KiB
SAFE
19148 
SWARMP
18778 
UNSAFE
4946 
No Report Filed
2855 

Length

Max length15
Median length6
Mean length5.724429768
Min length4

Characters and Unicode

Total characters261761
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSWARMP
2nd rowSWARMP
3rd rowSWARMP
4th rowSAFE
5th rowSAFE

Common Values

ValueCountFrequency (%)
SAFE19148
30.1%
SWARMP18778
29.5%
UNSAFE4946
 
7.8%
No Report Filed2855
 
4.5%
(Missing)17851
28.1%

Length

2022-06-30T21:04:59.028771image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-30T21:04:59.266217image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
safe19148
37.2%
swarmp18778
36.5%
unsafe4946
 
9.6%
no2855
 
5.6%
report2855
 
5.6%
filed2855
 
5.6%

Most occurring characters

ValueCountFrequency (%)
S42872
16.4%
A42872
16.4%
F26949
10.3%
E24094
9.2%
R21633
8.3%
W18778
7.2%
M18778
7.2%
P18778
7.2%
N7801
 
3.0%
o5710
 
2.2%
Other values (9)33496
12.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter227501
86.9%
Lowercase Letter28550
 
10.9%
Space Separator5710
 
2.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S42872
18.8%
A42872
18.8%
F26949
11.8%
E24094
10.6%
R21633
9.5%
W18778
8.3%
M18778
8.3%
P18778
8.3%
N7801
 
3.4%
U4946
 
2.2%
Lowercase Letter
ValueCountFrequency (%)
o5710
20.0%
e5710
20.0%
p2855
10.0%
r2855
10.0%
t2855
10.0%
i2855
10.0%
l2855
10.0%
d2855
10.0%
Space Separator
ValueCountFrequency (%)
5710
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin256051
97.8%
Common5710
 
2.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
S42872
16.7%
A42872
16.7%
F26949
10.5%
E24094
9.4%
R21633
8.4%
W18778
7.3%
M18778
7.3%
P18778
7.3%
N7801
 
3.0%
o5710
 
2.2%
Other values (8)27786
10.9%
Common
ValueCountFrequency (%)
5710
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII261761
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S42872
16.4%
A42872
16.4%
F26949
10.3%
E24094
9.2%
R21633
8.3%
W18778
7.2%
M18778
7.2%
P18778
7.2%
N7801
 
3.0%
o5710
 
2.2%
Other values (9)33496
12.8%

FIELD_INSPECTION_COMPLETED_DATE
Categorical

HIGH CARDINALITY
MISSING

Distinct5145
Distinct (%)11.0%
Missing16664
Missing (%)26.2%
Memory size496.8 KiB
02/11/2022 12:00:00 AM
 
143
02/16/2022 12:00:00 AM
 
131
02/15/2022 12:00:00 AM
 
131
02/18/2022 12:00:00 AM
 
125
02/09/2022 12:00:00 AM
 
125
Other values (5140)
46259 

Length

Max length22
Median length22
Mean length22
Min length22

Characters and Unicode

Total characters1032108
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique773 ?
Unique (%)1.6%

Sample

1st row02/10/2012 12:00:00 AM
2nd row10/04/2012 12:00:00 AM
3rd row01/23/2012 12:00:00 AM
4th row07/25/2012 12:00:00 AM
5th row10/19/2011 12:00:00 AM

Common Values

ValueCountFrequency (%)
02/11/2022 12:00:00 AM143
 
0.2%
02/16/2022 12:00:00 AM131
 
0.2%
02/15/2022 12:00:00 AM131
 
0.2%
02/18/2022 12:00:00 AM125
 
0.2%
02/09/2022 12:00:00 AM125
 
0.2%
11/01/2006 01:00:00 AM125
 
0.2%
02/15/2012 12:00:00 AM120
 
0.2%
02/08/2022 12:00:00 AM118
 
0.2%
12/01/2006 12:00:00 AM117
 
0.2%
02/10/2022 12:00:00 AM113
 
0.2%
Other values (5135)45666
71.8%
(Missing)16664
 
26.2%

Length

2022-06-30T21:04:59.374314image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
am46905
33.3%
12:00:0046302
32.9%
01:00:00603
 
0.4%
02/11/2022143
 
0.1%
02/16/2022131
 
0.1%
02/15/2022131
 
0.1%
02/18/2022125
 
0.1%
02/09/2022125
 
0.1%
11/01/2006125
 
0.1%
02/15/2012120
 
0.1%
Other values (5139)46032
32.7%

Most occurring characters

ValueCountFrequency (%)
0307188
29.8%
2139481
13.5%
1125134
12.1%
/93828
 
9.1%
93828
 
9.1%
:93828
 
9.1%
M46914
 
4.5%
A46905
 
4.5%
715264
 
1.5%
614609
 
1.4%
Other values (6)55129
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number656796
63.6%
Other Punctuation187656
 
18.2%
Space Separator93828
 
9.1%
Uppercase Letter93828
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0307188
46.8%
2139481
21.2%
1125134
19.1%
715264
 
2.3%
614609
 
2.2%
812604
 
1.9%
312491
 
1.9%
911513
 
1.8%
510865
 
1.7%
47647
 
1.2%
Uppercase Letter
ValueCountFrequency (%)
M46914
50.0%
A46905
50.0%
P9
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/93828
50.0%
:93828
50.0%
Space Separator
ValueCountFrequency (%)
93828
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common938280
90.9%
Latin93828
 
9.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0307188
32.7%
2139481
14.9%
1125134
13.3%
/93828
 
10.0%
93828
 
10.0%
:93828
 
10.0%
715264
 
1.6%
614609
 
1.6%
812604
 
1.3%
312491
 
1.3%
Other values (3)30025
 
3.2%
Latin
ValueCountFrequency (%)
M46914
50.0%
A46905
50.0%
P9
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1032108
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0307188
29.8%
2139481
13.5%
1125134
12.1%
/93828
 
9.1%
93828
 
9.1%
:93828
 
9.1%
M46914
 
4.5%
A46905
 
4.5%
715264
 
1.5%
614609
 
1.4%
Other values (6)55129
 
5.3%

QEWI_SIGNED_DATE
Categorical

HIGH CARDINALITY
MISSING

Distinct5198
Distinct (%)11.3%
Missing17763
Missing (%)27.9%
Memory size496.8 KiB
02/15/2007 12:00:00 AM
 
395
02/20/2007 12:00:00 AM
 
376
02/18/2022 12:00:00 AM
 
355
02/16/2007 12:00:00 AM
 
351
02/14/2007 12:00:00 AM
 
293
Other values (5193)
44045 

Length

Max length22
Median length22
Mean length22
Min length22

Characters and Unicode

Total characters1007930
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique773 ?
Unique (%)1.7%

Sample

1st row02/12/2012 12:00:00 AM
2nd row10/25/2012 12:00:00 AM
3rd row03/15/2012 12:00:00 AM
4th row08/17/2012 12:00:00 AM
5th row10/28/2011 12:00:00 AM

Common Values

ValueCountFrequency (%)
02/15/2007 12:00:00 AM395
 
0.6%
02/20/2007 12:00:00 AM376
 
0.6%
02/18/2022 12:00:00 AM355
 
0.6%
02/16/2007 12:00:00 AM351
 
0.6%
02/14/2007 12:00:00 AM293
 
0.5%
02/21/2022 12:00:00 AM286
 
0.4%
02/12/2007 12:00:00 AM255
 
0.4%
02/13/2007 12:00:00 AM225
 
0.4%
02/16/2012 12:00:00 AM206
 
0.3%
08/16/2012 12:00:00 AM198
 
0.3%
Other values (5188)42875
67.4%
(Missing)17763
27.9%

Length

2022-06-30T21:04:59.574803image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
am35735
26.0%
12:00:0035506
25.8%
pm10080
 
7.3%
07:00:005200
 
3.8%
08:00:004874
 
3.5%
02/15/2007395
 
0.3%
02/20/2007376
 
0.3%
02/18/2022355
 
0.3%
02/16/2007351
 
0.3%
02/14/2007293
 
0.2%
Other values (5066)44280
32.2%

Most occurring characters

ValueCountFrequency (%)
0310968
30.9%
2129244
12.8%
1108061
 
10.7%
/91630
 
9.1%
91630
 
9.1%
:91630
 
9.1%
M45815
 
4.5%
A35735
 
3.5%
722286
 
2.2%
817403
 
1.7%
Other values (6)63528
 
6.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number641410
63.6%
Other Punctuation183260
 
18.2%
Space Separator91630
 
9.1%
Uppercase Letter91630
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0310968
48.5%
2129244
20.1%
1108061
 
16.8%
722286
 
3.5%
817403
 
2.7%
612465
 
1.9%
312413
 
1.9%
911131
 
1.7%
510081
 
1.6%
47358
 
1.1%
Uppercase Letter
ValueCountFrequency (%)
M45815
50.0%
A35735
39.0%
P10080
 
11.0%
Other Punctuation
ValueCountFrequency (%)
/91630
50.0%
:91630
50.0%
Space Separator
ValueCountFrequency (%)
91630
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common916300
90.9%
Latin91630
 
9.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0310968
33.9%
2129244
14.1%
1108061
 
11.8%
/91630
 
10.0%
91630
 
10.0%
:91630
 
10.0%
722286
 
2.4%
817403
 
1.9%
612465
 
1.4%
312413
 
1.4%
Other values (3)28570
 
3.1%
Latin
ValueCountFrequency (%)
M45815
50.0%
A35735
39.0%
P10080
 
11.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1007930
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0310968
30.9%
2129244
12.8%
1108061
 
10.7%
/91630
 
9.1%
91630
 
9.1%
:91630
 
9.1%
M45815
 
4.5%
A35735
 
3.5%
722286
 
2.2%
817403
 
1.7%
Other values (6)63528
 
6.3%

LATE_FILING_AMT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

Distinct673
Distinct (%)1.1%
Missing1385
Missing (%)2.2%
Infinite0
Infinite (%)0.0%
Mean8236.741273
Minimum0
Maximum157500
Zeros19300
Zeros (%)30.4%
Negative0
Negative (%)0.0%
Memory size496.8 KiB
2022-06-30T21:04:59.799378image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median4000
Q39250
95-th percentile36000
Maximum157500
Range157500
Interquartile range (IQR)9250

Descriptive statistics

Standard deviation14338.29869
Coefficient of variation (CV)1.740773228
Kurtosis25.29311766
Mean8236.741273
Median Absolute Deviation (MAD)4000
Skewness4.058457769
Sum512267650
Variance205586809.3
MonotonicityNot monotonic
2022-06-30T21:05:00.060014image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
019300
30.4%
40006828
 
10.7%
3000905
 
1.4%
250844
 
1.3%
7250731
 
1.1%
6000664
 
1.0%
1000624
 
1.0%
1500619
 
1.0%
31000569
 
0.9%
4250545
 
0.9%
Other values (663)30564
48.1%
(Missing)1385
 
2.2%
ValueCountFrequency (%)
019300
30.4%
150368
 
0.6%
250844
 
1.3%
300124
 
0.2%
40091
 
0.1%
450149
 
0.2%
500491
 
0.8%
60086
 
0.1%
65037
 
0.1%
7006
 
< 0.1%
ValueCountFrequency (%)
15750032
0.1%
14900024
< 0.1%
14450025
< 0.1%
14150043
0.1%
1340005
 
< 0.1%
1175004
 
< 0.1%
1165002
 
< 0.1%
1025005
 
< 0.1%
1010004
 
< 0.1%
995009
 
< 0.1%

FAILURE_TO_FILE_AMT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

Distinct31
Distinct (%)< 0.1%
Missing1379
Missing (%)2.2%
Infinite0
Infinite (%)0.0%
Mean2119.037284
Minimum0
Maximum39000
Zeros41113
Zeros (%)64.7%
Negative0
Negative (%)0.0%
Memory size496.8 KiB
2022-06-30T21:05:00.320524image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31000
95-th percentile15000
Maximum39000
Range39000
Interquartile range (IQR)1000

Descriptive statistics

Standard deviation5222.800008
Coefficient of variation (CV)2.46470416
Kurtosis16.57271014
Mean2119.037284
Median Absolute Deviation (MAD)0
Skewness3.701956916
Sum131802000
Variance27277639.92
MonotonicityNot monotonic
2022-06-30T21:05:00.581936image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
041113
64.7%
10006042
 
9.5%
20004233
 
6.7%
50001629
 
2.6%
30001480
 
2.3%
160001048
 
1.6%
150001000
 
1.6%
17000961
 
1.5%
6000880
 
1.4%
4000833
 
1.3%
Other values (21)2980
 
4.7%
(Missing)1379
 
2.2%
ValueCountFrequency (%)
041113
64.7%
10006042
 
9.5%
20004233
 
6.7%
30001480
 
2.3%
4000833
 
1.3%
50001629
 
2.6%
6000880
 
1.4%
7000516
 
0.8%
8000180
 
0.3%
9000138
 
0.2%
ValueCountFrequency (%)
39000196
0.3%
370003
 
< 0.1%
36000173
0.3%
33000107
0.2%
2900080
0.1%
2800021
 
< 0.1%
2600015
 
< 0.1%
2300023
 
< 0.1%
220002
 
< 0.1%
210005
 
< 0.1%

FAILURE_TO_COLLECT_AMT
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS

Distinct477
Distinct (%)0.8%
Missing1200
Missing (%)1.9%
Infinite0
Infinite (%)0.0%
Mean4051.667575
Minimum0
Maximum1048000
Zeros51488
Zeros (%)81.0%
Negative0
Negative (%)0.0%
Memory size496.8 KiB
2022-06-30T21:05:00.830668image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile24000
Maximum1048000
Range1048000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation18732.42146
Coefficient of variation (CV)4.623385584
Kurtosis954.9867386
Mean4051.667575
Median Absolute Deviation (MAD)0
Skewness20.51379133
Sum252734920
Variance350903613.7
MonotonicityNot monotonic
2022-06-30T21:05:01.118574image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
051488
81.0%
10001361
 
2.1%
2000754
 
1.2%
3000621
 
1.0%
5000573
 
0.9%
4000514
 
0.8%
6000462
 
0.7%
8000357
 
0.6%
9000342
 
0.5%
7000332
 
0.5%
Other values (467)5574
 
8.8%
(Missing)1200
 
1.9%
ValueCountFrequency (%)
051488
81.0%
10001361
 
2.1%
2000754
 
1.2%
3000621
 
1.0%
4000514
 
0.8%
5000573
 
0.9%
6000462
 
0.7%
7000332
 
0.5%
8000357
 
0.6%
9000342
 
0.5%
ValueCountFrequency (%)
10480006
< 0.1%
2949003
< 0.1%
2620004
< 0.1%
2313006
< 0.1%
2116003
< 0.1%
2076003
< 0.1%
2058003
< 0.1%
2046003
< 0.1%
2029404
< 0.1%
1992004
< 0.1%

COMMENTS
Categorical

HIGH CARDINALITY
MISSING

Distinct9038
Distinct (%)50.7%
Missing45747
Missing (%)72.0%
Memory size496.8 KiB
N.Y.C.H.A
 
447
RESUBMISSION
 
315
ALTERNATIVE PROGRAM
 
243
DATA ENTERED BY RT
 
197
NEW YORK CITY HOUSING AUTHORITY
 
176
Other values (9033)
16453 

Length

Max length102
Median length88
Mean length60.11339801
Min length1

Characters and Unicode

Total characters1071882
Distinct characters88
Distinct categories13 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5568 ?
Unique (%)31.2%

Sample

1st rowPHILIP DEANS - OWNER - PHN# 212-673-6262EMAIL: PDEANS@CABRINIELDERCARE.ORG
2nd rowREPORT FILED 6/30/15 WAS REJECTED
3rd rowREPORT FILED 6/30/15 WAS REJECTED
4th rowINITIAL REPORT FILED 6/30/15 WAS REJECTED
5th rowINITIAL REPORT FILED 02/08/2016 WAS REJECTEDGAIL WEINER - ASSISTANT SECRETARY - PHN# 212-753-3381EMA

Common Values

ValueCountFrequency (%)
N.Y.C.H.A447
 
0.7%
RESUBMISSION315
 
0.5%
ALTERNATIVE PROGRAM243
 
0.4%
DATA ENTERED BY RT197
 
0.3%
NEW YORK CITY HOUSING AUTHORITY176
 
0.3%
CITY OWNED150
 
0.2%
CITY OWNED NO PENALTY122
 
0.2%
AMENDED FILING: OWNER: MARTHA BRAZOBAN120
 
0.2%
ADDED TO CYCLE 641
 
0.1%
SUBSEQUENT38
 
0.1%
Other values (9028)15982
 
25.1%
(Missing)45747
72.0%

Length

2022-06-30T21:05:01.425817image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
5877
 
3.9%
to4204
 
2.8%
on3646
 
2.4%
penalty2730
 
1.8%
filing2585
 
1.7%
report2576
 
1.7%
civil2382
 
1.6%
penalties2364
 
1.6%
stopped2326
 
1.5%
and2296
 
1.5%
Other values (14229)121481
79.7%

Most occurring characters

ValueCountFrequency (%)
135020
 
12.6%
E71560
 
6.7%
I49344
 
4.6%
A47969
 
4.5%
T46728
 
4.4%
N41782
 
3.9%
R40209
 
3.8%
S34521
 
3.2%
O32768
 
3.1%
131857
 
3.0%
Other values (78)540124
50.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter595342
55.5%
Decimal Number152367
 
14.2%
Space Separator135020
 
12.6%
Lowercase Letter108321
 
10.1%
Other Punctuation55445
 
5.2%
Dash Punctuation12574
 
1.2%
Open Punctuation6365
 
0.6%
Close Punctuation6116
 
0.6%
Currency Symbol277
 
< 0.1%
Math Symbol38
 
< 0.1%
Other values (3)17
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E71560
12.0%
I49344
 
8.3%
A47969
 
8.1%
T46728
 
7.8%
N41782
 
7.0%
R40209
 
6.8%
S34521
 
5.8%
O32768
 
5.5%
D29776
 
5.0%
L28542
 
4.8%
Other values (16)172143
28.9%
Lowercase Letter
ValueCountFrequency (%)
e13178
12.2%
n12085
11.2%
t11732
10.8%
i10810
10.0%
a9857
9.1%
l8986
8.3%
o8104
7.5%
d7656
7.1%
p5352
 
4.9%
s4843
 
4.5%
Other values (16)15718
14.5%
Other Punctuation
ValueCountFrequency (%)
/26065
47.0%
.10003
 
18.0%
:8425
 
15.2%
#3686
 
6.6%
,3512
 
6.3%
@2386
 
4.3%
&1169
 
2.1%
'160
 
0.3%
;28
 
0.1%
?5
 
< 0.1%
Other values (2)6
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
131857
20.9%
226443
17.4%
026401
17.3%
616103
10.6%
511076
 
7.3%
88854
 
5.8%
38640
 
5.7%
47984
 
5.2%
97733
 
5.1%
77276
 
4.8%
Math Symbol
ValueCountFrequency (%)
+31
81.6%
>3
 
7.9%
<3
 
7.9%
=1
 
2.6%
Open Punctuation
ValueCountFrequency (%)
(6362
> 99.9%
{3
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
)6113
> 99.9%
}3
 
< 0.1%
Space Separator
ValueCountFrequency (%)
135020
100.0%
Dash Punctuation
ValueCountFrequency (%)
-12574
100.0%
Currency Symbol
ValueCountFrequency (%)
$277
100.0%
Connector Punctuation
ValueCountFrequency (%)
_10
100.0%
Other Number
ValueCountFrequency (%)
½4
100.0%
Modifier Symbol
ValueCountFrequency (%)
`3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin703663
65.6%
Common368219
34.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
E71560
 
10.2%
I49344
 
7.0%
A47969
 
6.8%
T46728
 
6.6%
N41782
 
5.9%
R40209
 
5.7%
S34521
 
4.9%
O32768
 
4.7%
D29776
 
4.2%
L28542
 
4.1%
Other values (42)280464
39.9%
Common
ValueCountFrequency (%)
135020
36.7%
131857
 
8.7%
226443
 
7.2%
026401
 
7.2%
/26065
 
7.1%
616103
 
4.4%
-12574
 
3.4%
511076
 
3.0%
.10003
 
2.7%
88854
 
2.4%
Other values (26)63823
17.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1071870
> 99.9%
None12
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
135020
 
12.6%
E71560
 
6.7%
I49344
 
4.6%
A47969
 
4.5%
T46728
 
4.4%
N41782
 
3.9%
R40209
 
3.8%
S34521
 
3.2%
O32768
 
3.1%
131857
 
3.0%
Other values (75)540112
50.4%
None
ValueCountFrequency (%)
¿4
33.3%
½4
33.3%
ï4
33.3%

Interactions

2022-06-30T21:04:43.788065image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:36.791884image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:37.951895image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:39.040347image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:40.181782image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:41.318701image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:42.559149image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:43.962277image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:36.966821image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:38.112148image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:39.226898image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:40.342489image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:41.485981image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:42.727315image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:44.137263image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:37.134110image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:38.258216image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:39.374377image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:40.496833image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:41.656588image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:42.916934image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:44.283295image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:37.294178image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:38.415068image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:39.531442image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:40.649183image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:41.842647image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:43.087462image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:44.441671image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:37.451971image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:38.549588image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:39.664757image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:40.795839image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:42.014183image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:43.244972image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:44.619628image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:37.626959image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:38.716416image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:39.845893image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:40.970314image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:42.197129image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:43.438489image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:44.786484image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:37.790237image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:38.877464image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:40.025038image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:41.148226image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:42.385412image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2022-06-30T21:04:43.622467image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Correlations

2022-06-30T21:05:01.698290image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-06-30T21:05:01.926517image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-06-30T21:05:02.154099image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-06-30T21:05:02.381983image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-06-30T21:05:02.553081image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-06-30T21:04:45.208510image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-30T21:04:47.310015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-06-30T21:04:48.882674image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-06-30T21:04:49.814304image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TR6_NOCONTROL_NOFILING_TYPECYCLEBINHOUSE_NOSTREET_NAMEBOROUGHBLOCKLOTSEQUENCE_NOSUBMITTED_ONCURRENT_STATUSQEWI_NAMEQEWI_BUS_NAMEQEWI_BUS_STREET_NAMEQEWI_CITYQEWI_STATEQEWI_ZIPQEWI_NYS_LIC_NOOWNER_NAMEOWNER_BUS_NAMEOWNER_BUS_STREET_NAMEOWNER_CITYOWNER_ZIPOWNER_STATEFILING_DATEFILING_STATUSPRIOR_CYCLE_FILING_DATEPRIOR_STATUSFIELD_INSPECTION_COMPLETED_DATEQEWI_SIGNED_DATELATE_FILING_AMTFAILURE_TO_FILE_AMTFAILURE_TO_COLLECT_AMTCOMMENTS
0TR6-913448-9A-N1913448Auto-Generated94114712.0143-45SANFORD AVENUEQUEENS5049381.0NaNNo Report FiledNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNo Report FiledNaNNaNNaNNaN11750.01000.00.0NaN
1TR6-913451-9A-N1913451Auto-Generated93393807.015OLIVER STREETBROOKLYN609912.0NaNUNSAFENaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNo Report FiledNaNNaNNaNNaN0.00.063400.0NaN
2TR6-913456-9A-N1913456Auto-Generated91077623.0180ELDRIDGE STREETMANHATTAN415122.0NaNNo Report FiledNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNo Report FiledNaNNaNNaNNaN4250.00.00.0NaN
3TR6-913458-9A-N1913458Auto-Generated94001141.041-4650 STREETQUEENS13411.0NaNNo Report FiledNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNo Report FiledNaNNaNNaNNaN13250.02000.01000.0NaN
4TR6-913460-9A-N1913460Auto-Generated91088779.0220EAST 19 STREETMANHATTAN899461.0NaNSAFENaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNo Report FiledNaNNaNNaNNaN500.00.00.0PHILIP DEANS - OWNER - PHN# 212-673-6262EMAIL: PDEANS@CABRINIELDERCARE.ORG
5TR6-913471-9A-N1913471Auto-Generated91030341.0100AMSTERDAM AVENUEMANHATTAN1156301.0NaNNo Report FiledNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNo Report FiledNaNNaNNaNNaN4000.00.00.0NaN
6TR6-913472-9A-N1913472Auto-Generated91018503.0160EAST 34 STREETMANHATTAN889501.0NaNNo Report FiledNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNo Report FiledNaNNaNNaNNaN4750.00.00.0NaN
7TR6-913473-9A-N1913473Auto-Generated91087286.0300WEST 135 STREETMANHATTAN195975011.0NaNNo Report FiledNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNo Report FiledNaNNaNNaNNaN11500.02000.020000.0NaN
8TR6-913479-9A-N1913479Auto-Generated94223678.0190-05HILLSIDE AVENUEQUEENS10499751.0NaNNo Report FiledNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNo Report FiledNaNNaNNaNNaN9750.01000.00.0NaN
9TR6-913480-9A-N1913480Auto-Generated93337151.018173 STREETBROOKLYN5906181.0NaNNo Report FiledNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNo Report FiledNaNNaNNaNNaN27500.01000.00.0NaN

Last rows

TR6_NOCONTROL_NOFILING_TYPECYCLEBINHOUSE_NOSTREET_NAMEBOROUGHBLOCKLOTSEQUENCE_NOSUBMITTED_ONCURRENT_STATUSQEWI_NAMEQEWI_BUS_NAMEQEWI_BUS_STREET_NAMEQEWI_CITYQEWI_STATEQEWI_ZIPQEWI_NYS_LIC_NOOWNER_NAMEOWNER_BUS_NAMEOWNER_BUS_STREET_NAMEOWNER_CITYOWNER_ZIPOWNER_STATEFILING_DATEFILING_STATUSPRIOR_CYCLE_FILING_DATEPRIOR_STATUSFIELD_INSPECTION_COMPLETED_DATEQEWI_SIGNED_DATELATE_FILING_AMTFAILURE_TO_FILE_AMTFAILURE_TO_COLLECT_AMTCOMMENTS
63568TR6-816512-8C-I1816512Initial81088616.016WEST 21 STREETMANHATTAN8227505NaN2017-12-18 00:00:00SAFEBARIS ACARPACE ENGINEERING P.C.183 MADISON AVENUENEW YORKNY10016PE - 088950ALEX KIRKPRNaNNaNNaNNaN12/18/2017 12:00:00 AMSAFENaNNaN10/25/2017 12:00:00 AM09/23/2017 08:00:00 PM0.00.00.0NaN
63569TR6-816565-8B-I1816565Initial83327373.0725CHURCH AVENUEBROOKLYN533024NaN2018-02-20 00:00:00SAFEANDREW KATZANDREW KATZ ENGINEERS3452 BEDFORD AVEBROOKLYNNY11210PE - 051094LAWRENCE BERNSTEINJONAS EQUITIES, INC.NaNNaNNaNNaN10/05/2018 12:00:00 AMSAFENaNNaN02/14/2018 12:00:00 AM06/30/2018 08:00:00 PM2000.00.00.0NaN
63570TR6-916956-9A-I1916956Initial93396957.0185OCEAN AVENUEBROOKLYN50267501NaN2020-03-12 00:00:00SAFEANDREW KATZANDREW KATZ ENGINEERS3452 BEDFORD AVEBROOKLYNNY11210PE - 051094JOSH SHINEPRNaNNaNNaNNaN03/12/2020 12:00:00 AMSAFENaNNaN04/02/2020 12:00:00 AM03/30/2020 08:00:00 PM0.00.00.0NaN
63571TR6-806107-8B-I1806107Initial81052803.0321EAST 108 STREETMANHATTAN1680131.02016-11-22 00:00:00SAFEANDREW KATZNaN3452 BEDFORD AVEBROOKLYNNY11210PE - 051094ROBERT GORDONAJ Clarke Real Estate Corp.NaNNaNNaNNaN11/22/2016 12:00:00 AMSAFE07/12/2011 12:00:00 AMSWARMP09/28/2016 12:00:00 AM11/11/2016 12:00:00 AM10750.00.00.0INITIAL REPORT FILED 5/25/16 WAS REJECTED
63572TR6-801841-8C-I1801841Initial81015054.0154WEST 27 STREETMANHATTAN802711.02017-11-29 00:00:00SWARMPANDREW KATZANDREW KATZ ENGINEERS3452 BEDFORD AVEBROOKLYNNY11210PE - 051094ISAAC SCHWARTZWest End Estates LLCNaNNaNNaNNaN12/06/2018 12:00:00 AMSWARMP05/11/2007 12:00:00 AMSAFE12/03/2018 12:00:00 AM11/26/2018 07:00:00 PM17950.05000.00.0NaN
63573TR6-801722-8B-I1801722Initial81014471.05768 AVENUEMANHATTAN78841.02016-12-22 00:00:00SWARMPANDREW KATZNaN3452 BEDFORD AVEBROOKLYNNY11210PE - 051094STEVEN GREEN580 8th Ave RealtyNaNNaNNaNNaN12/22/2016 12:00:00 AMSWARMP05/10/2012 12:00:00 AMSAFE10/28/2016 12:00:00 AM12/14/2016 12:00:00 AM3750.00.00.0NaN
63574TR6-814017-8B-I1814017Initial82114714.01514SEDGWICK AVENUEBRONX288091.02019-06-11 00:00:00SAFEANDREW KATZANDREW KATZ ENGINEERS3452 BEDFORD AVEBROOKLYNNY11210PE - 051094NOEMI MARTINEZSEDGWICK RIVERVIEW, L.P.NaNNaNNaNNaN06/11/2019 12:00:00 AMSAFENaNNaN06/05/2019 12:00:00 AM07/22/2019 08:00:00 PM4000.01000.00.0NaN
63575TR6-814521-8A-I1814521Initial83017631.026622 STREETBROOKLYN899221.02019-01-09 00:00:00SWARMPANDREW KATZANDREW KATZ ENGINEERS3452 BEDFORD AVEBROOKLYNNY11210PE - 051094JACK LOCICEROSOUTH SLOPE REALTY OF BROOKLYN, INCNaNNaNNaNNaN01/09/2019 12:00:00 AMSWARMPNaNNaN12/14/2018 12:00:00 AM12/06/2018 07:00:00 PM9750.01000.00.0ADDED TO FISP UNIVERSE 8/11/2015 (CAW)FINAL C.O. 07/27/2004
63576TR6-810327-8C-I1810327Initial83201014.02675OCEAN AVENUEBROOKLYN7381791.02018-10-31 00:00:00SWARMPANDREW KATZANDREW KATZ ENGINEERS3452 BEDFORD AVEBROOKLYNNY11210PE - 051094ELIE GABAYOcean Road Terrace Coop Apts IncNaNNaNNaNNaN10/31/2018 12:00:00 AMSWARMP05/08/2013 12:00:00 AMSWARMP10/23/2018 12:00:00 AM12/28/2018 07:00:00 PM1800.00.00.0NaN
63577TR6-810947-8C-I1810947Initial84440232.099-6063 ROADQUEENS211175013.02020-01-16 00:00:00No Report FiledANDREW KATZANDREW KATZ ENGINEERS3452 BEDFORD AVEBROOKLYNNY11210PE - 051094ISAK RADONCICCOMPREHENSIVE DESIGNSNaNNaNNaNNaNNaNNo Report Filed01/12/2007 12:00:00 AMSAFE08/24/2020 12:00:00 AM08/27/2020 08:00:00 PM80000.036000.00.0NaN